|(slightly skeptical) Educational society promoting "Back to basics" movement against IT overcomplexity and bastardization of classic Unix
|Configuring Hosts From the Command Line
|Excluding host from scheduling
|Shutting down execution daemon
|Monitoring Queues and Jobs
|ulimit problem with infiniband in SGE
|Controlling Queues and Jobs
|Grid Engine Config Tips
You can not remove the host from execution host list while jobs are running. But you can exclude the host from a particular queue even if jobs are running on particular host using qmod command:
qmod -d blades.q@b8 # (-d == disable) root@z99 changed state of "blades.q@b8" (disabled)after maintenance is done you can re-enable it
qmod -e blades.q@b8 # enable b8 in queue blades.q
You can also disable all hosts of the particular queue. Now it will not schedule new jobs on the hosts that are defined in it and you can wait for any active jobs to finish before you run the shutdown procedure.
qmod -d b16.q
To suspend a queue use the qmod -sq command:
qmod -sq all.q
You can use basic regular expression defining queue. For example
qmod -sq '*'
suspends all existing queues. This is convenient for performing maintenance.
For information about queues, see Creating and modifying SGE Queues
The qmod -sq command prevents new jobs from being scheduled to the disabled queue. You should then wait until no jobs are running in the queue instances before you kill the daemons.
To unsuspend you can use the command
qmod -usq '*'
You also can remove the host from the queue you use thus preventing further scheduling of this host. The problem is when you have many queues. In this case you should convert them to hostgroups.
You can use softstop option for the execution daemon init script. Along with regular "stop" the init script for execution daemon accepts option softstop. Which preserves running jobs as it does not kill shepherd process:
# This script can be called with the following arguments: # # start start execution daemon # stop Terminates the execution daemon # and the shepherd. This only works if the execution daemon # spool directory is in the default location. # softstop do not kill the shepherd process # # Unix commands which may be used in this script: # cat cut tr ls grep awk sed basename # # This script requires the script $SGE_ROOT/util/arch # PATH=/bin:/usr/bin:/sbin:/usr/sbin
You can also run a "suicidal" script on all cores of the node and specify explicit node in submit script.
The script should be run as root or SGE admin account and can contain just one sleep 1 command and then service sgeexecd.644 stop command.
qsub -q blades.q@b8 suicidal_job.sh
This way you can shut down execution daemon.
Option -q provides a lot of flexibility including the ability to use simple regular expressions. See ws_queue definition. Among them:
Defines the destination of the job. The destination names a queue, a server, or a queue at a
The qsub command will submit the script to the
server defined by the destination argument. If
the destination is a routing queue, the job may be
routed by the server to a new destination.
If the -q option is not specified, the qsub com-
mand will submit the script to the default server.
See PBS_DEFAULT under the Environment Variables
section on this man page and the PBS ERS section
2.7.4, "Default Server".
If the -q option is specified, it is in one of the
following three forms:
If the destination argument names a queue and does
not name a server, the job will be submitted to
the named queue at the default server.
If the destination argument names a server and
does not name a queue, the job will be submitted
to the default queue at the named server.
If the destination argument names both a queue and
a server, the job will be submitted to the named
queue at the named server.
# as user: qstat | grep neteler | tr -s ' ' ' ' | cut -d' ' -f2 > /tmp/to_suspend.sge cat /tmp/to_suspend.sge # as root (?): su - for i in `cat /tmp/to_suspend.sge` ; do qmod -sj $i ; done qstat # remove crashed blade from list of execution hosts: qconf -de blade14 # delete host from list: qconf -mhgrp "@allhosts" # apply new list: qconf -shgrp "@allhosts" # verify queue stats: qstat -f # resubmit jobs to other nodes (as job user!!): exit for i in `cat /tmp/to_suspend.sge` ; do qresub $i ; done qstat This command send a signal to a running job : qmod -sj | -usf | -cd (suspend | unsuspend | clear error) qmod -sj 3312136 qmod -usj 3312136 root - unsuspended job 3312136
Earthwithsun.comQ: I'm having some trouble with a specific node. Until I resolve it, I don't want any jobs to run on ii. How can I temporarily take this node out of the nodes "pool"?
A1: To disable:
qmod -d *@node_name
qmod -e *@node_name
A2: Without knowing your SGE version I cannot say for certain that this will achieve the desired outcome, however,
qconf -de foowill delete the execution host foo.
qconf -ae foowill then add the host foo back to the execution list.
If you're running 6.1 or better, here's the best way. Create a new hostgroup called
qconf -ahgrp @disabled
Create a new resource quota set with
qconf -arqs limit hosts @disabled to slots=0
Now, to disable a host, just add it to the host group
qconf -aattr hostgroup hostlist MYHOST @disabled
To reenable the host, remove it from the host group
qconf -dattr hostgroup hostlist MYHOST @disabled
This process will stop new jobs from being scheduled to the machine and allow the currently running jobs to complete.
I have more than 200 jobs I need to submit to and sge cluster. I'll be submitting them into two ques. One of the queues have a machine that I don't want to submit jobs to. How can I exclude that machine? The only thing I found that might be helpful is (assuming three valid nodes available to q1 and all the available nodes for q2 are valid):qsub -q q1.q@n1 q1.q@n2 q1.q@n3 q2.q
Assuming you don't want to run it on is called n4 then adding the following to your script should work.#$ -l h=!n4
The best way I've found for this is to set up a custom resource on the nodes that you want to allow the execution on, then require that resource when you submit the job.
In qmon, go to the "complex" configuration and add a new attribute. Set the name to something like "my_allowed" and the shortcut to something like "m_a", the type to BOOL, the relation to ==, requestable to Yes, consumable to No, and "Add" it. Commit your changes to the complex configurations.
The next step is probably easier to do from the command line, but you can do it in qmon, as well. You need to add your consumable to each host that you're going to allow your job to run on. In qmon, you can go to the host configuration, select execution host, and open each host in turn, clicking on the consumables/fixed attributes tab and adding the new complex that you just configured above with "True" as the value. From the command line, you can get a list of your execution hosts with "qconf -sel". This list is suitable for passing to a loop and grepping out the host(s) you don't want included. Do something like this:qconf -sel | grep -v host_to_exclude | while read host; do EDITOR="ed" qconf -me $h <<EOL /complex_values/s/$/,my_test=True/ w q EOL doneThis lets you programmatically edit the host (not normally allowed by qconf as it wants to start up your editor for you). It does this by setting the editor to "ed" (you'll have to make sure you have the ed editor installed... try running it by hand first... type "q" to get out). ed takes the list of editing commands on it's stdin, so we give it three commands. The first edits the line with the complex_values on it to include the my_test value. The second writes out the temporary file and the third quits ed. Once you've done this, submit your jobs with a limit option that requires your new complex:qsub -q whatever -l my_test=True my_prog.shThe -l option sets a limit and the my_test=True says the job can only run on hosts that have the complex my_test with a value of True. Since the complex isn't consumable, it can still run as many jobs on each host as it wants to (up to the slot limit for the hosts), but it will avoid any hosts that don't have the my_test complex set to True.There is a nice bypass to this.
Generate a simple bash file:#!/bin/bash sleep 6000 #replace 6000 with any long period of time that will be enough to submit your jobs
submit this jobs to the node you wish to exclude until they fully occupy it.
Google matched content
Groupthink : Two Party System as Polyarchy : Corruption of Regulators : Bureaucracies : Understanding Micromanagers and Control Freaks : Toxic Managers : Harvard Mafia : Diplomatic Communication : Surviving a Bad Performance Review : Insufficient Retirement Funds as Immanent Problem of Neoliberal Regime : PseudoScience : Who Rules America : Neoliberalism : The Iron Law of Oligarchy : Libertarian Philosophy
War and Peace : Skeptical Finance : John Kenneth Galbraith :Talleyrand : Oscar Wilde : Otto Von Bismarck : Keynes : George Carlin : Skeptics : Propaganda : SE quotes : Language Design and Programming Quotes : Random IT-related quotes : Somerset Maugham : Marcus Aurelius : Kurt Vonnegut : Eric Hoffer : Winston Churchill : Napoleon Bonaparte : Ambrose Bierce : Bernard Shaw : Mark Twain Quotes
Vol 25, No.12 (December, 2013) Rational Fools vs. Efficient Crooks The efficient markets hypothesis : Political Skeptic Bulletin, 2013 : Unemployment Bulletin, 2010 : Vol 23, No.10 (October, 2011) An observation about corporate security departments : Slightly Skeptical Euromaydan Chronicles, June 2014 : Greenspan legacy bulletin, 2008 : Vol 25, No.10 (October, 2013) Cryptolocker Trojan (Win32/Crilock.A) : Vol 25, No.08 (August, 2013) Cloud providers as intelligence collection hubs : Financial Humor Bulletin, 2010 : Inequality Bulletin, 2009 : Financial Humor Bulletin, 2008 : Copyleft Problems Bulletin, 2004 : Financial Humor Bulletin, 2011 : Energy Bulletin, 2010 : Malware Protection Bulletin, 2010 : Vol 26, No.1 (January, 2013) Object-Oriented Cult : Political Skeptic Bulletin, 2011 : Vol 23, No.11 (November, 2011) Softpanorama classification of sysadmin horror stories : Vol 25, No.05 (May, 2013) Corporate bullshit as a communication method : Vol 25, No.06 (June, 2013) A Note on the Relationship of Brooks Law and Conway Law
Fifty glorious years (1950-2000): the triumph of the US computer engineering : Donald Knuth : TAoCP and its Influence of Computer Science : Richard Stallman : Linus Torvalds : Larry Wall : John K. Ousterhout : CTSS : Multix OS Unix History : Unix shell history : VI editor : History of pipes concept : Solaris : MS DOS : Programming Languages History : PL/1 : Simula 67 : C : History of GCC development : Scripting Languages : Perl history : OS History : Mail : DNS : SSH : CPU Instruction Sets : SPARC systems 1987-2006 : Norton Commander : Norton Utilities : Norton Ghost : Frontpage history : Malware Defense History : GNU Screen : OSS early history
The Peter Principle : Parkinson Law : 1984 : The Mythical Man-Month : How to Solve It by George Polya : The Art of Computer Programming : The Elements of Programming Style : The Unix Haterís Handbook : The Jargon file : The True Believer : Programming Pearls : The Good Soldier Svejk : The Power Elite
Most popular humor pages:
Manifest of the Softpanorama IT Slacker Society : Ten Commandments of the IT Slackers Society : Computer Humor Collection : BSD Logo Story : The Cuckoo's Egg : IT Slang : C++ Humor : ARE YOU A BBS ADDICT? : The Perl Purity Test : Object oriented programmers of all nations : Financial Humor : Financial Humor Bulletin, 2008 : Financial Humor Bulletin, 2010 : The Most Comprehensive Collection of Editor-related Humor : Programming Language Humor : Goldman Sachs related humor : Greenspan humor : C Humor : Scripting Humor : Real Programmers Humor : Web Humor : GPL-related Humor : OFM Humor : Politically Incorrect Humor : IDS Humor : "Linux Sucks" Humor : Russian Musical Humor : Best Russian Programmer Humor : Microsoft plans to buy Catholic Church : Richard Stallman Related Humor : Admin Humor : Perl-related Humor : Linus Torvalds Related humor : PseudoScience Related Humor : Networking Humor : Shell Humor : Financial Humor Bulletin, 2011 : Financial Humor Bulletin, 2012 : Financial Humor Bulletin, 2013 : Java Humor : Software Engineering Humor : Sun Solaris Related Humor : Education Humor : IBM Humor : Assembler-related Humor : VIM Humor : Computer Viruses Humor : Bright tomorrow is rescheduled to a day after tomorrow : Classic Computer Humor
The Last but not Least Technology is dominated by two types of people: those who understand what they do not manage and those who manage what they do not understand ~Archibald Putt. Ph.D
Copyright © 1996-2021 by Softpanorama Society. www.softpanorama.org was initially created as a service to the (now defunct) UN Sustainable Development Networking Programme (SDNP) without any remuneration. This document is an industrial compilation designed and created exclusively for educational use and is distributed under the Softpanorama Content License. Original materials copyright belong to respective owners. Quotes are made for educational purposes only in compliance with the fair use doctrine.
FAIR USE NOTICE This site contains copyrighted material the use of which has not always been specifically authorized by the copyright owner. We are making such material available to advance understanding of computer science, IT technology, economic, scientific, and social issues. We believe this constitutes a 'fair use' of any such copyrighted material as provided by section 107 of the US Copyright Law according to which such material can be distributed without profit exclusively for research and educational purposes.
This is a Spartan WHYFF (We Help You For Free) site written by people for whom English is not a native language. Grammar and spelling errors should be expected. The site contain some broken links as it develops like a living tree...
|You can use PayPal to to buy a cup of coffee for authors of this site
Last modified: April 12, 2019