Softpanorama

May the source be with you, but remember the KISS principle ;-)
Home Switchboard Unix Administration Red Hat TCP/IP Networks Neoliberalism Toxic Managers
(slightly skeptical) Educational society promoting "Back to basics" movement against IT overcomplexity and  bastardization of classic Unix

qstat

News  SGE Commands Reference Queue states SGE Troubleshooting Job or Queue Reported in Error State E Queue instance in AU state Why Won't My Jobs Run ?
qalter qmod qacct qconf qping qhold qdel
qhost qsub qrsh Monitoring Queues and Jobs Creating and modifying SGE Queues Getting information about hosts qsub
SGE cheat sheet Creating and modifying SGE Queues Starting and Killing Daemons Submitting Jobs To Queue Instanc Sysadmin Horror Stories Humor Etc

Introduction

The qstat command provides the status of all jobs and queues in the cluster. Not always in thest foemt for troubleshooting. See also Duke University Tools for SGE.

To see the State in which is our work can use the command qstat. Executed without options it shows the State of for the particular user, qstst -u '*" for all users.

Use qstat -u '*' -f to display a more detailed list of jobs within SGE. You can also use qstat to query the status of a job, given it's job id. For this, you would use the -j _N_ option where _N_ would be the job id.

For bash shell users, the following can be added to your .profile file:
alias qstat='qstat -u "*"'
There are several reasons why a job will not run. The first reason is due to the job resource requirements. It is possible that the cluster is full and you have to wait for available resources (necessary number of cores, etc.) It is also possible the job may have experienced and error in the run script. In which case the status would be "Eqw". You can query a job's status by entering the following:
 qstat -explain c -j _N_ 

where _N_ is the Grid Engine job id.

The most useful options

The most useful options are:

Other useful commands:

qstat -explain c -j job-id               specific job status
qdel job-id                              delete job
qsub -l h_vmem=### job.sh                mem limit, see queue_conf(5)RESOURCE LIMITS
qstat -f -u "*"

Format of the qstat report

By default qstat shows a list of jobs running in the queue and their state. The output will look like this

job-ID  prior   name   user   state   submit/start at     queue       slots ja-task-ID
----------------------------------------------------------------------------------------
74950 0.55500  tut1    eid     qw     12/01/2007 20:42:45   mpi.q7@b1     1

It lists  the job id, priority, name, user, state, submission time, queue, and CPU slots used by the job.

For SGE version 6 and later qstat prints a by default the following columns of information for each running and pending job in the system:

qstat without arguments will print the status of all jobs in the queue.

State of the job

States :

Combinations

 The states of the queue

The queue can can be at the following states or some combination of those states:

qstat -j

qstat -j [job_id]:  Gives the reason why the pending job (if any) is not being scheduled. qstat -j [job_list] Prints either for all pending jobs or the jobs contained in job_list the reason for not being scheduled. Additional information can be obtained my looking at the man page for qstat. Type "man qstat" for additional information.

You can view individuals jobs using the

qstat -j <job_id>

option. The output in this case is much more verbose, and includes information about the state of the job, and queuing considerations. You can also use the

qstat -u <user_id>

to see only your jobs.

qstat -f

One final option is to use the

qstat -f

option to see the status of the queues on the systems. Output of qstat -f and output qstat -u '*' -f are somewhat different. 

In the latter case qstat adds additional like that contain job number. For example

qstat -u '*' -f | grep -B 1 1357
blades.q@b10 BIP 0/20/20 1.00 lx-amd64 a
1357 0.50500 T665-VASP medea r 10/08/2014 19:54:37 20
That allow to see what nodes multi-node job occupies:
qstat -u '*' -f | grep -B 1 1356
blades.q@b2              BIP   0/20/20        1.00     lx-amd64      a
   1356 0.50500 T664-VASP  medea        r     10/08/2014 16:11:22    20
--
blades.q@b5              BIP   0/20/20        1.00     lx-amd64      a
   1356 0.50500 T664-VASP  medea        r     10/08/2014 16:11:22    20

if there are multiple queues you can filter unneeded output by specified the queue using

qstat -q <queue_name> -f

sge_qstat -- the default qstat options file

sge_qstat defines the command line switches that will be used by qstat by default. If available, the default sge_qstat file is read and processed by qstat(1).

There is a cluster global and a user private sge_qstat file. The user private file has the highest precedence and is followed by the cluster global sge_qstat file. Command line switches used with qstat(1) override all switches contained in the user private or cluster global sge_qstat file.

The default sge_qstat file may contain an arbitrary number of lines, although it is unclear what is the value of lines after the first.  Blank lines and lines with a '#' sign at the first column are skipped. Each line can contain set of  qstat(1) options. More than one option per line is allowed.

Here is an example of a sge_qstat default options file (note the leading blank before the first "-"):

=====================================================
# Just show me my own running and suspended jobs
 -s rs -u $USER
=====================================================
Having defined a default sge_qstat file like this and using qstat without parameters 
qstat
has the same effect as if qstat was executed with:
qstat -s rs -u <current_user>

Top updates

Bulletin Latest Past week Past month
Google Search


NEWS CONTENTS

Old News ;-)

[May 07, 2017] Monitoring and Controlling Jobs

biowiki.org

After submitting your job to Grid Engine you may track its status by using either the qstat command, the GUI interface QMON, or by email.

Monitoring with qstat

The qstat command provides the status of all jobs and queues in the cluster. The most useful options are:

You can refer to the man pages for a complete description of all the options of the qstat command.

Monitoring Jobs by Electronic Mail

Another way to monitor your jobs is to make Grid Engine notify you by email on status of the job.

In your batch script or from the command line use the -m option to request that an email should be send and -M option to precise the email address where this should be sent. This will look like:

#$ -M myaddress@work
#$ -m beas

Where the (-m) option can select after which events you want to receive your email. In particular you can select to be notified at the beginning/end of the job, or when the job is aborted/suspended (see the sample script lines above).

And from the command line you can use the same options (for example):

qsub -M myaddress@work -m be job.sh

How do I control my jobs

Based on the status of the job displayed, you can control the job by the following actions:

Monitoring and controlling with QMON

You can also use the GUI QMON, which gives a convenient window dialog specifically designed for monitoring and controlling jobs, and the buttons are self explanatory.


For further information, see the SGE User's Guide ( PDF, HTML).


Show only job that are not on hold in qstat

I'm running some jobs on an SGE cluster. Is there a way to make qstat show me only jobs that are not on hold?

qstat -s p shows pending jobs, which is all those with state "qw" and "hqw".

qstat -s h shows hold jobs, which is all those with state "hqw".

I want to be able to see all jobs with state "qw" only and NOT state "hqw". The man pages seem to suggest it isn't possible, but I want to be sure I didn't miss something. It would be REALLY useful and it's really frustrating me that I can't make it work.

Other cluster users have a few thousand jobs on hold ("hqw") and only a handful actually in the queue waiting to run ("qw"). I want to see quickly and easily the stuff that is not on hold so I can see where my jobs are in the queue. It's a pain to have to show everything and then scroll back up to find the relevant part of the output.

sungridengine


share|improve this question

asked Feb 14 at 23:08

Laura

So I figured out a way to show what I want by piping the output of qstat into grep:

qstat -u "*" | grep " qw"

(Note that I need to search for " qw" not just "qw" or it will return the "hqw" states as well.)

But I'd still love to know if it's possible using qstat options only.

SGE quick and dirty how to find jobs on 'bad' slots () By yakshaving

Nov 26, 2007 YakShaving Shawn Ferry's Weblog
I occasionally have a need to find queues in Sun Grid Engine that are in one of the possibly problematic states which have an occupied slot. It is just infrequent enough that I don't remember exactly how I did it the last time.

qstat -f | awk '$6~/[cdsuE]/ && $3!~/\^[0]/'
queuename                      qtype used/tot. load_avg arch          states
[email protected]   BIP   1/1       -NA-     sol-amd64     adu
[email protected]   BIP   1/1       -NA-     sol-amd64     adu

An alternate is "qstat -f | awk '$6~/[cdsuE]/ && $3~/\^[1-9]/'" which also avoids printing the header line. In the example above 'state' in $6 matches 's' and 'used' does not begin with '0'.

The possibly more elegant 'qstat -f -qs cdsuE' still requires a second comparison in awk of '$0!~/--/' to filter out the queue separator lines. (qstat -f -qs acduE | awk '$0!~/--/ && $3!~/\^[0]/')


Finally because I can never remember what exactly all the queue states are and the qstat man page doesn't have the nice table:


aoACD – Number of queue instances that are in at least one of the following states:
a – Load threshold alarm
o – Orphaned
A – Suspend threshold alarm
C – Suspended by calendar
D – Disabled by calendar

cdsuE – Number of queue instances that are in at least one of the following states:
c – Configuration ambiguous
d – Disabled
s – Suspended
u – Unknown
E – Error

Job State/Status:

d(eletion), E(rror), h(old), r(unning), R(estarted), s(uspended), S(uspended), t(ransfering), T(hreshold) or w(aiting).

References: SGE (N1GE 6.0) -- Monitoring and Controlling Queues

Recommended Links

Google matched content

Softpanorama Recommended

Top articles

Sites

Top articles

Sites

Internal

External



Etc

Society

Groupthink : Two Party System as Polyarchy : Corruption of Regulators : Bureaucracies : Understanding Micromanagers and Control Freaks : Toxic Managers :   Harvard Mafia : Diplomatic Communication : Surviving a Bad Performance Review : Insufficient Retirement Funds as Immanent Problem of Neoliberal Regime : PseudoScience : Who Rules America : Neoliberalism  : The Iron Law of Oligarchy : Libertarian Philosophy

Quotes

War and Peace : Skeptical Finance : John Kenneth Galbraith :Talleyrand : Oscar Wilde : Otto Von Bismarck : Keynes : George Carlin : Skeptics : Propaganda  : SE quotes : Language Design and Programming Quotes : Random IT-related quotesSomerset Maugham : Marcus Aurelius : Kurt Vonnegut : Eric Hoffer : Winston Churchill : Napoleon Bonaparte : Ambrose BierceBernard Shaw : Mark Twain Quotes

Bulletin:

Vol 25, No.12 (December, 2013) Rational Fools vs. Efficient Crooks The efficient markets hypothesis : Political Skeptic Bulletin, 2013 : Unemployment Bulletin, 2010 :  Vol 23, No.10 (October, 2011) An observation about corporate security departments : Slightly Skeptical Euromaydan Chronicles, June 2014 : Greenspan legacy bulletin, 2008 : Vol 25, No.10 (October, 2013) Cryptolocker Trojan (Win32/Crilock.A) : Vol 25, No.08 (August, 2013) Cloud providers as intelligence collection hubs : Financial Humor Bulletin, 2010 : Inequality Bulletin, 2009 : Financial Humor Bulletin, 2008 : Copyleft Problems Bulletin, 2004 : Financial Humor Bulletin, 2011 : Energy Bulletin, 2010 : Malware Protection Bulletin, 2010 : Vol 26, No.1 (January, 2013) Object-Oriented Cult : Political Skeptic Bulletin, 2011 : Vol 23, No.11 (November, 2011) Softpanorama classification of sysadmin horror stories : Vol 25, No.05 (May, 2013) Corporate bullshit as a communication method  : Vol 25, No.06 (June, 2013) A Note on the Relationship of Brooks Law and Conway Law

History:

Fifty glorious years (1950-2000): the triumph of the US computer engineering : Donald Knuth : TAoCP and its Influence of Computer Science : Richard Stallman : Linus Torvalds  : Larry Wall  : John K. Ousterhout : CTSS : Multix OS Unix History : Unix shell history : VI editor : History of pipes concept : Solaris : MS DOSProgramming Languages History : PL/1 : Simula 67 : C : History of GCC developmentScripting Languages : Perl history   : OS History : Mail : DNS : SSH : CPU Instruction Sets : SPARC systems 1987-2006 : Norton Commander : Norton Utilities : Norton Ghost : Frontpage history : Malware Defense History : GNU Screen : OSS early history

Classic books:

The Peter Principle : Parkinson Law : 1984 : The Mythical Man-MonthHow to Solve It by George Polya : The Art of Computer Programming : The Elements of Programming Style : The Unix Hater’s Handbook : The Jargon file : The True Believer : Programming Pearls : The Good Soldier Svejk : The Power Elite

Most popular humor pages:

Manifest of the Softpanorama IT Slacker Society : Ten Commandments of the IT Slackers Society : Computer Humor Collection : BSD Logo Story : The Cuckoo's Egg : IT Slang : C++ Humor : ARE YOU A BBS ADDICT? : The Perl Purity Test : Object oriented programmers of all nations : Financial Humor : Financial Humor Bulletin, 2008 : Financial Humor Bulletin, 2010 : The Most Comprehensive Collection of Editor-related Humor : Programming Language Humor : Goldman Sachs related humor : Greenspan humor : C Humor : Scripting Humor : Real Programmers Humor : Web Humor : GPL-related Humor : OFM Humor : Politically Incorrect Humor : IDS Humor : "Linux Sucks" Humor : Russian Musical Humor : Best Russian Programmer Humor : Microsoft plans to buy Catholic Church : Richard Stallman Related Humor : Admin Humor : Perl-related Humor : Linus Torvalds Related humor : PseudoScience Related Humor : Networking Humor : Shell Humor : Financial Humor Bulletin, 2011 : Financial Humor Bulletin, 2012 : Financial Humor Bulletin, 2013 : Java Humor : Software Engineering Humor : Sun Solaris Related Humor : Education Humor : IBM Humor : Assembler-related Humor : VIM Humor : Computer Viruses Humor : Bright tomorrow is rescheduled to a day after tomorrow : Classic Computer Humor

The Last but not Least Technology is dominated by two types of people: those who understand what they do not manage and those who manage what they do not understand ~Archibald Putt. Ph.D


Copyright © 1996-2021 by Softpanorama Society. www.softpanorama.org was initially created as a service to the (now defunct) UN Sustainable Development Networking Programme (SDNP) without any remuneration. This document is an industrial compilation designed and created exclusively for educational use and is distributed under the Softpanorama Content License. Original materials copyright belong to respective owners. Quotes are made for educational purposes only in compliance with the fair use doctrine.

FAIR USE NOTICE This site contains copyrighted material the use of which has not always been specifically authorized by the copyright owner. We are making such material available to advance understanding of computer science, IT technology, economic, scientific, and social issues. We believe this constitutes a 'fair use' of any such copyrighted material as provided by section 107 of the US Copyright Law according to which such material can be distributed without profit exclusively for research and educational purposes.

This is a Spartan WHYFF (We Help You For Free) site written by people for whom English is not a native language. Grammar and spelling errors should be expected. The site contain some broken links as it develops like a living tree...

You can use PayPal to to buy a cup of coffee for authors of this site

Disclaimer:

The statements, views and opinions presented on this web page are those of the author (or referenced source) and are not endorsed by, nor do they necessarily reflect, the opinions of the Softpanorama society. We do not warrant the correctness of the information provided or its fitness for any purpose. The site uses AdSense so you need to be aware of Google privacy policy. You you do not want to be tracked by Google please disable Javascript for this site. This site is perfectly usable without Javascript.

Last modified: July, 28, 2019