|May the source be with you, but remember the KISS principle ;-)|
|Contents||Bulletin||Scripting in shell and Perl||Network troubleshooting||History||Humor|
|News||Enterprise Unix System Administration||Recommended Links||Installation Planning||Usage of NFS||Installation of the Master Host|
|SGE cheat sheet||qconf||qsub||qalter||qstat|
|Starting and Killing SGE Daemons||SGE Queues||Configuring Hosts From the Command Line||SGE Submit Scripts||Humor||Etc|
We will assume that installation is performed on RHEL 6.5 or 6.6. We also assume that NFS is used for sharing files with master host.
Degree of sharing is not that important but generally $SGE_ROOT/$SGE_CELL should be shared. Efficiency consideration that are sited by many are overblown and without careful measurements and determining real bottleneck you might fall into classical trap called "premature optimization". And as Donald Knuth used to say "Premature optimization is the source of all evil". and long before him Talleyrand gave the following advice to young diplomats: "First and foremost, not too too much zeal". Just substitute "young diplomats" for novice SGE administrators.
The same issue applies to a choice between classic spooling vs. Berkeley DB. Without measurements the selection of Berkeley DB is fools gold.
Before installing an execution host, you first need to install and configure the master. Master should be running for installation of an execution host.
The installation process is pretty simple:
That's why you need to register daemon with chkconfig and start it manually. If you share the whole $SGE_ROOT tree of $SGE_ROOT/$SGE_CELL subtree via NFS, there is no need to to run thee execution daemon setup script, although this is the less consuming part of the installation. Here is how Dave love explained the situation:
== Set Up ==
After installing the distribution into place you need to run the installation script `inst_sge` (or a wrapper) on your master host, and on all execution hosts which don't share the installation via a distributed file system. This will configure and start up the daemons.
If the installation is shared with the execution hosts, using a shared, writable spool directory, it isn't necessary to do an explicit install on them. Use `inst_sge` to install the execd on the master host and copy to or share with the hosts the `sgeexecd` init script.
Then start this after the qmaster has been configured with the relevant host names (with `qconf -ah`).
You may or may not want to keep the execd on the master host for maintaining global load values, for instance, but you probably want to ensure it has zero slots, so as not to be able to run jobs.
You can install as many execution hosts as you want using GUI installer: just add as many hosts as you wish and they will be installed one by one in one batch.
On execution host you need to check the following four preconditions:
Register and patch the server. You might need access to repository for the installation. I am not sure that necessary RPMs are present of RHEL installation DVD.
Update or install java Usually java is already installed. But you still need to verify that. In case it is not, you need to install it.
Configure NTP. Check using the command: ntpdate -u ntp1.firm.com
In addition you need to perform several steps to ensure proper environment for the execd.
For installation of execution host you need a running Grid Engine qmaster.
You can add an administrative host with the command:
qconf -ah <hostname>
If you RPM-based installation then mounting NSF directory is a pre-requisite for installing RPMs and as such it is already met. In case of tar files based installation you need to do it now, if this is the first execution host in the cluster.
First you need to decide how much of $SGE_ROOT tree you want to share. For small installations the master host can also serve as NFS server. But of cause you can get better result using specialized server. Select how much you need to share. Most simple SGE installation for small clusters share either
For RPM-based installation the second method is preferable as you already installed binaries on the host: it make no sense to put executable on NSF so minimum is $SGE_ROOT/$SGE_CELL/common directory. Of course, in this case, you need install executables on each execution host. But as installing prerequisite RPMs is enough trouble (and you need to do it in any case), so two more RPMs does not make much difference.
Generally for small installations (say, less then 32 nodes and 640 cores) that do not have huge load of small (several minutes length) jobs, there is not much difference whether you share $SGE_ROOT or $SGE_ROOT/default. In this particular case sharing less does not really improve efficiency on modern servers. See Usage of NFS in Grid Engine.
Create the directory for shared files (for example /opt/sge/default) and put an appropriate line in /etc/fstab file.
export $SGE_ROOT=/opt/SGE # sge installation root directory. export $SGE_DELL=default # cell directory $SGE_MASTER=qmaster # hostname of your SGE master host. Should be in /etc/hosts
SGE_NFS_SHARE=$SGE_ROOT/$SGE_CELL # specify how much you share mkdir -p $SGE_NFS_SHARE echo "$SGE_MASTER:$SGE_NFS_SHARE $SGE_NFS_SHARE nfs vers=3,rw,hard,intr,tcp,rsize=32768,wsize=32768 1 2"" >> /etc/fstab mount $SGE_NFS_SHARE
Now you need to add the host to the /etc/exports file on the master host or NSF server if you share by host, not my netmask. For example:
/opt/sge 10.194.186.254(rw,no_root_squash) 10.194.181.26(rw,no_root_squash)
After that restart the NFS daemon so that NFS daemon reread export file:
# service nfs restart Shutting down NFS mountd: [ OK ] Shutting down NFS daemon: [ OK ] Shutting down NFS quotas: [ OK ] Shutting down NFS services: [ OK ] Starting NFS services: [ OK ] Starting NFS quotas: [ OK ] Starting NFS daemon: [ OK ] Starting NFS mountd: [ OK ]
If this was not done before, universally for all execution hosts, you need to create passwordless login environment now. This might be the case when this is the first execution host you are installing. In all other cases ssh certificates were already generated, you need just copy /root/.ssh/authorized_hosts for any working execution host.
Tip: If you already have any execution host already configured , just copy file authorized_hosts from already configured execution host.
cd /root/.ssh scp sge01:/root/.ssh/authorized_hosts .Check ssh access from the master host to the node on which you install the execution host (b5 in the example below):
root@m17: # ssh b5 The authenticity of host 'b5 (10.194.181.46)' can't be established. RSA key fingerprint is 18:35:6e:96:11:77:27:fc:ac:1c:8e:46:36:2b:ae:2b. Are you sure you want to continue connecting (yes/no)? yes Warning: Permanently added 'b5,10.194.181.46' (RSA) to the list of known hosts. Last login: Thu Jul 26 08:29:41 2012 from sge_master.firma.net
Check if sgeadmin user is present if you are using it and it has GID and UID identical to the sgeadmin account on the master host. Execution host installer does not check that and at the end your execution host will not be able to communicate with the master and you might do not know what to do, as you missed this step and forget about this tricky error. Typically ROM-based installations install this user as a part of RPM install, but tar-based installation do not and. if you want to use it, you need to create it manually.
|Check if sgeadmin user is present if you are using it and it has GID ans UID identical to the sgeadmin account on the master host. Execution host installer does not check that and at the end your execution host will not be able to communicate with the master and you might do not know what to do, as you missed this step and forget about this tricky error.|
You also need to verify that UID and GUI of all users and application accounts are identical to the master host. There can be tricky errors, especially if sgeadmin user UID and GID are different or the user id for important application does not match.
If this is a reinstallation make sure that old version of execd daemon is not running and new version of qmaster daemon is running. Also remove star files from /etc/init.d.
Check /etc/services. If you installed SGE from RPMs, such as Son of Grid engine RPMs, this typically is added by RPMs. If necessary, manually add ports that you used during configuration of SGE master. For example:
sge_qmaster 6444/tcp # Grid Engine Qmaster Service sge_qmaster 6444/udp # Grid Engine Qmaster Service sge_execd 6445/tcp # Grid Engine Execution Service sge_execd 6445/udp # Grid Engine Execution Service
On execution host and master host: verify the $SGE_ROOT and $SGE_CELL variables are set properly host: If the $SGE_ROOT environment variable is not set on execution host, set it by typing:
# SGE_ROOT=/opt/sge; export SGE_ROOT
Always manually confirm that you have set the $SGE_ROOT environment variable correctly. Type:
# echo $SGE_ROOT
Check if files are visible in the NFS-mounted directory and owned by iether root or sgeadmin (depending on what you chose during installation of qmaster)
cd $SGE_ROOT/$SGE_CELL && ls -l
cd $SGE_ROOT/$SGE_CELL && ls -l
On the master host:
On the master host: add the host to the list of execution hosts (this is not strictly necessary)qconf -ae
The -ae option (add execution host) displays an editor that contains a configuration template for an execution host. The editor is either the default vi editor or the editor that corresponds to the EDITOR environment variable.
In this template you specify the hostname, which should be the name of an execution host we wnat to configure. In VI screen change the name and save the template. See the host_conf(5) man page for a detailed description of the template entries to be changed.
1 hostname template 2 load_scaling NONE 3 complex_values NONE 4 user_lists NONE 5 xuser_lists NONE 6 projects NONE 7 xprojects NONE 8 usage_scaling NONE 9 report_variables NONE
Depending on your distribution you might need to correct a couple of bugs.
There are two installers for Grid engine:
On open source distribution GUI-based installer rarely works, and sometimes is not even shipped with the distributions, so you are forced using command line installer. On commercial distributions GUI installer typically works.
One advantage of GUI based installer is that it allow you to install multiple execution hosts at once. You can achieve the same affect with command line installer by using Expect.
If you have both you have a luxury to decide which installation method is best for you ( GUI installer, is pretty nice). Some considerations.
For detailed, step-by-step instruction see
On the execution host: You need to ensure two things:
Ensure proper environment after reboot by sourcing SGE environment in /etc/profile or copying the file /$SGE_ROOT/default/common/settings.sh to /etc/profile.d/sge.sh
That the execution daemon is registered using chkconfig and start on reboot automatically.
# chkconfig sgeexecd.$SGE_CLUSTER_NAME on sgeexecd.p6444 0:off 1:off 2:on 3:on 4:on 5:on 6:off
# service sgeexecd.$SGE_CLUSTER_NAME start starting sge_execd
The first start of the execution daemon takes two-three minutes of more. It's really slow even on a very fast server.
#!/bin/bash # # Post install operations for SGE execution host # . /$SGE_ROOT/default/common/settings.sh # Add sgeexecd.$SGE_CLUSTER_NAME (or whatever is your cluster name) to default services on level 3 and 5 chkconfig sgeexecd.$SGE_CLUSTER_NAME on # On the execution host: start the sge_execd service service sgeexecd.$SGE_CLUSTER_NAME start # add nessesary commands to /etc/profile echo ". /$SGE_ROOT/default/common/settings.sh" >> /etc/profile
On the master host: Specify a queue for this host. That can be done by either adding it to existing queue or copying existing queue, renaming it and saving under new name.
To add a new queue using existing queue as a template use commands
# qconf -sq c32.q > b2.q
hostlist lusprocessors 32slots 32shell /bin/bashpe_list ms
qconf -Aq b2.q root@lus17 added "b2.q" to cluster queue list
See Creating and modifying SGE Queues
Verify that the execution host has been declared with the command
which lists all execution hosts.
You can also use qconf -se <hostname> to see parameters configured (usually
only hostname is configured) See
Configuring Hosts From the Command
ps -ef | grep sge
Specifically, you should see that the sge_execd daemon is running.
/sbin/service sgeexecd.p6444 start
If it starts correctly you might forgot to run chkconfig command.
Grid Engine messages can be found in syslog during startup:
After startup the daemons log their messages in their spool directories.
Where $EXEC_HOSTNAME is hostname of execution host we want to see the messages from.
Mar 02, 2016 | liv.ac.uk
This is Son of Grid Engine version v8.1.9.
See <http://arc.liv.ac.uk/repos/darcs/sge-release/NEWS> for information on recent changes. See <https://arc.liv.ac.uk/trac/SGE> for more information.
The .deb and .rpm packages and the source tarball are signed with PGP key B5AEEEA9.* sge-8.1.9.tar.gz, sge-8.1.9.tar.gz.sig: Source tarball and PGP signature * RPMs for Red Hat-ish systems, installing into /opt/sge with GUI installer and Hadoop support: * gridengine-8.1.9-1.el5.src.rpm: Source RPM for RHEL, Fedora * gridengine-*8.1.9-1.el6.x86_64.rpm: RPMs for RHEL 6 (and CentOS, SL) See < https://copr.fedorainfracloud.org/coprs/loveshack/SGE/ > for hwloc 1.6 RPMs if you need them for building/installing RHEL5 RPMs. * Debian packages, installing into /opt/sge, not providing the GUI installer or Hadoop support: * sge_8.1.9.dsc, sge_8.1.9.tar.gz: Source packaging. See <http://wiki.debian.org/BuildingAPackage> , and see < http://arc.liv.ac.uk/downloads/SGE/support/ > if you need (a more recent) hwloc. * sge-common_8.1.9_all.deb, sge-doc_8.1.9_all.deb, sge_8.1.9_amd64.deb, sge-dbg_8.1.9_amd64.deb: Binary packages built on Debian Jessie. * debian-8.1.9.tar.gz: Alternative Debian packaging, for installing into /usr. * arco-8.1.6.tar.gz: ARCo source (unchanged from previous version) * dbwriter-8.1.6.tar.gz: compiled dbwriter component of ARCo (unchanged from previous version)
More RPMs (unsigned, unfortunately) are available at < http://copr.fedoraproject.org/coprs/loveshack/SGE/ >.
I've done this a couple of times by now, and I always forget one step or another. Most of the information is on http://verahill.blogspot.com.au/2012/06/setting-up-sun-grid-engine-with-three.html but here it is in a briefer form:
In the example I've used krypton as the node name, and 192.168.1.180 as the IP.
My front node is called beryllium and has an IP of 192.168.1.1.
0. On the front node
Add the new node name to the front node/queue master
Add execution hostqconf -aewhich opens a text file in vim
Edited hostname (krypton) but nothing else. Saving returnsadded host krypton to exec host listAdd krypton as a submit hostqconf -as kryptonkrypton added to submit host listDoing this before touching the node makes life a little bit easier.
1. Edit /etc/hosts on the node
Leave127.0.0.1 localhostbut remove127.0.1.1 kryptonand make sure that it says192.168.1.180 kryptoninstead.
Throw in192.168.1.1 berylliumas well.
2. Install SGE on nodesudo apt-get install gridengine-exec gridengine-clientYou'll be asked aboutConfigure automatically: yes Cell name: rupert Master hostname: beryllium3. Add node to queue and group
I maintain separate queues and groups depending on how many cores each node has. See e.g. http://verahill.blogspot.com.au/2012/06/setting-up-sun-grid-engine-with-three.html for how to create queues and groups.
If they already exits, just doqconf -aattr hostgroup hostlist krypton @fourcores qconf -aattr queue slots "[krypton=4]" fourcores.qto add the new node.
4. Add pe to queue if necessary
Since I have different queues depending on the number of cores of a node, I tend to have to fiddle with this.
See e.g. http://verahill.blogspot.com.au/2012/06/setting-up-sun-grid-engine-with-three.html for how to create pe:s.
If the pe you need is already created, you can doqconf -mq fourcores.qand edit pe_list
On the front node, doqhost
HOSTNAME ARCH NCPU LOAD MEMTOT MEMUSE SWAPTO SWAPUS ------------------------------------------------------------------------------- global - - - - - - - beryllium lx26-amd64 3 0.16 7.8G 5.3G 14.9G 398.2M boron lx26-amd64 6 6.02 7.6G 1.6G 14.9G 0.0 helium lx26-amd64 2 - 2.0G - 1.9G - lithium lx26-amd64 3 - 3.9G - 0.0 - neon lx26-amd64 8 8.01 31.4G 1.3G 59.6G 0.0 krypton lx26-amd64 4 4.01 15.6G 2.8G 14.9G 0.0
This section describes installing the Sun Grid Engine software. These instructions are streamlined for installations particular to the Sun Shared Visualization 1.1 software.
Complete Sun Grid Engine documentation, including an installation guide, is available at:
http://docs.sun.com/app/docs/coll/1017.3To Prepare to Install the Sun Grid Engine Software
This procedure is for installations on all Solaris and Linux servers.
1. Determine which host is to be the queue master (qmaster) and which host is to be the NFS server for your grid.
If the resources are available, the same host can perform both roles.
2. Determine which hosts are to be the execution hosts for your grid.
If the resources are available and these systems are configured with graphics accelerators, the execution hosts can also be the graphics servers.
Note - Execution hosts need the korn shell, ksh. Solaris hosts include ksh by default, but Linux hosts might need ksh to be installed.
3. Determine your installation root directory.
The package default is /gridware/sge, however the Sun Grid Engine documentation calls this <sge_root> or /sge_root. These instructions use the variable, $SGE_ROOT.
4. Become superuser of the NFS server and declare the variable:# setenv SGE_ROOT /gridware/sge
If you chose a different installation root directory in Step 3, type that directory name instead of /gridware/sge.
5. Create the base directory for $SGE_ROOT if the path has multiple directory components:# mkdir /gridware
6. Determine an SGE administrative login that can be used on all systems intended to be administration hosts.
For example, you might plan to use these parameters:
or /gridware/sge (if that is your SGE_ROOT choice)
The Sun Grid Engine administrator can have a different user ID than sgeadmin. However, the administrative user ID (530 in this example) must be available across all hosts in the grid.
On SuSE hosts, group 4 (adm) might not already be defined in /etc/group. In that case, you need to add that group.
7. Create the sgeadmin user on the NFS server for your grid.
Use the values you selected in Step 6, as in this example::# useradd -u 530 -g 4 -d $SGE_ROOT -m -s /bin/tcsh -c "Sun Grid Engine Admin" sgeadmin
8. Assign the sgeadmin user a password:# passwd sgeadmin
9. Append the following lines to the sgeadmin .cshrc file:if ( $?prompt == 1 ) then if ( -f /gridware/sge/default/common/settings.csh ) then source /gridware/sge/default/common/settings.csh endif endif
Replace /gridware/sge with the value of $SGE_ROOT if different.
Step 9 , as the variable will not be set in a fresh shell until the settings.csh file is sourced.
You might choose to do the same for root's .cshrc or .tcshrc, or the equivalent file for root's shell.
10. Continue the installation of software on the NFS server by performing one of these procedures:
1. Permit $SGE_ROOT to be shared (exported) by the NFS server.
If your base directory of $SGE_ROOT is already shared, you do not need to perform this step.
On the Linux NFS server, append the following line to the /etc/exports file:/gridware *(rw,sync,no_root_squash)
where /gridware is the base directory of your $SGE_ROOT.
2. Inform the operating system of the changes you have made:
- For SuSE Linux:# /etc/init.d/nfs*server stop ; /etc/init.d/nfs*server start
- For Red Hat Linux:# /etc/init.d/nfs restart
3. If the system automounts using the hosts map, you can test the accessibility of the $SGE_ROOT directory from other systems on the network with this command:# ls /net/nfsserverhostname/$SGE_ROOT
4. From each server in the grid, access the NFS server's $SGE_ROOT as each server's $SGE_ROOT using /etc/vfstab, /etc/fstab, or automounting.
Note - Submit hosts (client machines) also need to mount the NFS server's $SGE_ROOT.
Execution hosts must not mount the NFS server with the nosuid option, as setuid is needed by Sun Grid Engine's rlogin and rsh for its qrsh command to work properly.
a. Add the following line to the /etc/fstab file:nfsserverhostname:/gridware /gridware nfs auto,suid,bg,intr 0 0
Your Linux system might also need the no_root_squash option in this line.
b. Type these two commands:# mkdir /gridware # mount /gridware
where /gridware is the base directory of your $SGE_ROOT.
Note - If you use NIS to resolve host names, add the server's name to the /etc/hosts file and ensure that files is in the hosts entry in the /etc/nsswitch.conf file. Mounting occurs before the NIS name service is started. The first hostname on the /etc/hosts line for the execution host itself should not include a domain.
5. Determine port numbers.
You must determine an available port on the qmaster system. Sun Grid Engine components will use this port to communicate with the qmaster daemon. This port must be a single port number that is available on all current or prospect submit and execution hosts in your grid.
These port numbers can be any value, but the following port numbers have been assigned by the Internet Assigned Number Authority (IANA):
- sge_qmaster 6444/tcp
- sge_execd 6445/tcp
Note - For more information about IANA, see:
If you are running a firewall on any execution host, ensure that the execution daemon's port allows traffic in.
6. Communicate the port numbers to the hosts.
These port numbers can be communicated to the hosts involved either by inserting the port numbers into every host's /etc/inet/services or /etc/services file or by setting Sun Grid Engine environment variables. The latter method, detailed in Step 4 of To Complete the Software Installation, is more convenient, because each Sun Grid Engine user already needs to use a Sun Grid Engine environment setup file. If you allow Sun Grid Engine to use this setup file, you will not have to add sge entries into every host's services file.
To use this environment variable technique, set these environment variables before you invoke ./install_qmaster in Step 2 of To Complete the Software Installation. Use the port numbers determined in Step 5 in place of 6444 and 6445 in these commands:# setenv SGE_QMASTER_PORT 6444 # setenv SGE_EXECD_PORT 6445
The lines you include in the setup file for Sun Grid Engine will be executed by Step 5 of To Complete the Software Installation. (After installation, you will need to ensure that the setup file's set and export environment variables are naming SGE_QMASTER_PORT and SGE_EXECD_PORT.)
7. As superuser of the NFS server, install the Sun Grid Engine packages into $SGE_ROOT.
The NGS server will need both Sun Grid Engine architecture-independent common files and architecture-dependent files for the architecture of every submit and execution host. (Each architecture is a pairing of processor instruction set and operating system.) You might also choose to install documentation files.
These files can be installed from RPM packages on a Linux system. Files for additional nonnative architectures need to be installed from tar bundles, which is explained in Step 1 in To Complete the Software Installation.
Refer to TABLE 3-2, which lists commonly used Sun Grid Engine 6.1 Linux software RPM packages and the download files that contain those packages. If you are installing a release other than Sun Grid Engine 6.1, the download file names will refer to that version instead of reading 6_1. Also, newer versions of Sun Grid Engine might use file names that say sge instead of n1ge.
TABLE 3-2 Sun Grid Engine 6.1 Linux Software RPM Packages
Sun Grid Engine architecture-independent common files, including documentation files
Linux kernel 2.4 or 2.6, glibc >= 2.3.2, for AMD Opteron or Intel EM64T
Linux kernel 2.4 or 2.6, glibc >= 2.3.2, for 32-bit x86
Common but Optional
Accounting and Reporting Console (ARCo) for all architectures, not needed for the core product (optional).
To install each of the RPM packages you selected, type an rpm command line such as this:# rpm -iv /path-to-rpm-file/sun-nlge-rest-of-filename.rpm
8. Perform the steps in To Complete the Software Installation.To Complete the Software Installation
This procedure is for installations on all Solaris and Linux servers.
1. Install additional Sun Grid Engine tar bundles of files needed by hosts with a different operating system than the NFS server.
TABLE 3-3 lists Sun Grid Engine 6.1 software tar bundles, which can install nonnative software on a Solaris or Linux NFS server. Use these bundles to install software on an NFS server as needed to support hosts with a different operating system. (Newer versions of Sun Grid Engine might use file names that say sge instead of n1ge.)
TABLE 3-3 Sun Grid Engine 6.1 Software tar Bundles
Name of tar File Bundle
Architecture independent files (required, but was already installed from packages on the NFS server)
Linux kernel 2.4 or 2.6, glibc >= 2.3.2, for AMD Opteron and Intel EM64T
Linux kernel 2.4 or 2.6, glibc >= 2.2.5, for 32-bit x86
Solaris 8 and higher, for 64-bit SPARC
Solaris 9 and higher, for 32-bit x86
Solaris 10, for 64-bit x64 (such as AMD Opteron)
Accounting and Reporting Console (ARCo) for all architectures, not needed for the core product
Sun Web Console, required for ARCo, Linux, for 32-bit x86
Sun Web Console, required for ARCo, Solaris, for x86
Sun Web Console, required for ARCo, Solaris, for 64-bit SPARC
After you download the additional software you need, you can install the contents of each tar.gz file in the $SGE_ROOT directory with a command such as this:# gunzip -c nlge-6_1-platform.tar.gz | (cd $SGE_ROOT; tar xf -)
If you installed any of the tar bundles mentioned in this step, you will need to answer n when the installation script asks (as in Step 3):Did you install this version with >pkgadd< or did you already verify and set the file permissions of your distribution (enter: y)
2. On the queue master host, type:# cd $SGE_ROOT ; ./install_qmaster
The Sun Grid Engine installation script begins.
3. The script prompts you for information and requests confirmation of selected values.
As you progress through the script, consider the following:
- The Sun N1 Grid Engine 6 Installation Guide has a table to help plan and record the answers to the questions asked during installation. For the simplest installation, accept all the defaults not discussed in the following text, unless your $SGE_ROOT is not /gridware/sge.
- The installation script asks: "Do you want to install Grid Engine as admin user >sgeadmin<? (y/n)". Answer y, so that all spool files are created as owned by that user. This answer avoids a problem where an execution host's root becomes nobody over NFS and therefore cannot access the spooling directories.
Note - The installation script might instead ask this question: "Do you want to install Grid Engine under a user id other than >root<? (y/n) [y]". Answer y. Later, you are asked for the user ID, which can be sgeadmin (as created in Step 6 of To Prepare to Install the Sun Grid Engine Software).
- The installation script asks: "Did you install this version with >pkgadd< or did you already verify and set the file permissions of your distribution (enter: y)". If you installed exclusively from packages, answer y. If you installed even partially from tar files (as in Step 1) or other means, answer n, and the install_qmaster script sets the file permissions appropriately.
- The installation script asks: "Are all hosts of your cluster in a single DNS domain (y/n)". Unless you are certain that you need domain checking, answer y. Sun Grid Engine then ignores domain components when comparing hostnames.
Execution hosts and the queue master must agree on the primary name of the execution host. If the execution host and the queue master do not agree on hostnames, a host_aliases file in the $SGE_ROOT directory enables SGE to understand that certain names are equivalent. For example, a host_aliases file might include this line:myhost1 my1 myhost1-ib my1-ib
Every host name on this line is considered equivalent to the first name on the line (myhost1), which is the primary host name. For more details, see the Sun Grid Engine man page for host_aliases (5).
In addition, Sun Grid Engine requires that a host's unique hostname is associated with a true IP address, not the localhost address 127.0.0.1.
- Select to use the BerkeleyDB, but do not configure a separate BerkeleyDB server.
- If your site uses NIS, a usable group ID range can be determined by studying the output of:# ypcat -k group.bygid | sort -n | more
Or, ask your administrator for a reasonable range of unused group IDs. Sun Grid Engine uses the group IDs for each of the parallel jobs that are running at a given time.
- When prompted for administrative and submit hosts, include the name of the queue master host as an administrative and submit host, unless you forbid submissions from that host.
- You can create a shadow host that takes over for the qmaster if it becomes unavailable. This action is optional.
- Use the following command to add administrative hosts (which might be configured to be execution hosts) if those hosts were omitted:# qconf -ah hostname, anotherhost
- You can display the administrative host list by typing:# qconf -sh
- You can add submit hosts by typing:# qconf -as myhost,anotherhost,stillmore
- Typing the following displays the submit host list:qconf -ss
4. Update environment variables in settings files.
If you decided to communicate the port numbers to all SGE hosts using SGE's environment setup file, you now need to assure that SGE sets the correct port numbers for environment variables SGE_QMASTER_PORT and SGE_EXECD_PORT. (You would have made that choice at Step 6 of To Install the Software on a Solaris System or Step 6 of To Install the Software on a Linux System, and would have determined the port numbers in the step before these steps.)
You might find that the proper variable values were written when you ran install_qmaster.
a. Edit the SGE settings file for csh or tcsh.
The file is $SGE_ROOT/default/common/settings.csh.
b. In the settings.csh file, look for lines such as these:unsetenv SGE_QMASTER_PORT unsetenv SGE_EXECD_PORT
If you find such lines, change them to use your port numbers.
You determined the port numbers in Step 5 of To Install the Software on a Solaris System or Step 5 of To Install the Software on a Linux System. For example, change the lines to the following:setenv SGE_QMASTER_PORT 6444 setenv SGE_EXECD_PORT 6445
c. Edit the SGE settings file for sh, bash, and ksh.
The file is $SGE_ROOT/default/common/settings.sh
d. In the settings.sh file, look for lines such as these:unset SGE_QMASTER_PORT unset SGE_EXECD_PORT
If you find such lines, change them to use your port numbers.
For example, change the lines to the following:SGE_QMASTER_PORT=6444; export SGE_QMASTER_PORT SGE_EXECD_PORT=6445; export SGE_EXECD_PORT
The settings files contain the lines to unset these environment variables by default. This default behavior is desirable if you if had instead decided to enter the port numbers in every SGE host's /etc/services or /etc/inet/services file.
5. Source the file to set up your environment to use Sun Grid Engine.
- For tcsh/csh users, type:% source /gridware/sge/default/common/settings.csh
Substitute /gridware/sge with your value of $SGE_ROOT. Consider having root's .login do so.
- For sh/bash/ksh users, type:$ . /gridware/sge/default/common/settings.sh
Substitute /gridware/sge with the $SGE_ROOT. Consider having root's .profile or .bashrc do so.
6. Create the sgeadmin user on each of the other administration hosts of the grid:# useradd -u 530 -g 4 -d $SGE_ROOT -s /bin/tcsh -c "Sun Grid Engine Admin" sgeadmin
Note - Unlike Step 7 of To Prepare to Install the Sun Grid Engine Software, the -m option is not needed for these other administration hosts. Assign the sgeadmin a password, as in Step 8 of that procedure.
Alternatively, you can add the sgeadmin entries to the respective /etc/passwd and /etc/shadow files.
7. As superuser on every execution host, set the SGE_ROOT environment variable and then type:# cd $SGE_ROOT ; ./install_execd
You might need to create the execution host's default spooling directory. As superuser on the NFS server, type:# mkdir $SGE_ROOT/default/spool/exec-hostname
The same value for exec-hostname is needed in the procedure To Set Up Sun Grid Engine Environment Variables
8. After the environment is set up, submit a test job.
To specify the job to execute on your host:exechost% qsub -q all.q@'hostname' $SGE_ROOT/examples/jobs/simple.sh exechost% qstat -f
Job output and errors are in the initiating user's home directory, with filenames similar to the following:simple.sh.e1 simple.sh.o1
Note - If you run the job as root, these files are in the execution host's root directory. If you do not know which host executed the job, you do not know which root directory the files are in. Therefore, submit jobs as a user whose home directory is in one place irrespective of execution host or specify the execution hostname explicitly.
To Set Up Sun Grid Engine Environment Variables
Use one of the following commands:
- For tcsh and csh users, type:% source /gridware/sge/default/common/settings.csh
Substitute /gridware/sge with your $SGE_ROOT.
- For sh, bash, and ksh users, type:$ . /gridware/sge/default/common/settings.sh
Substitute /gridware/sge with your $SGE_ROOT.
Note - These commands add $SGE_ROOT/bin/$ARCH to $path, add $SGE_ROOT/man to $MANPATH, set $SGE_ROOT, and if needed set $SGE_CELL to $COMMD_PORT.
Messages from Sun Grid Engine can be found in:
- /tmp/qmaster_messages (during Sun Grid Engine queue master startup)
- /tmp/execd_messages (during Sun Grid Engine exec daemon startup)
After the startup the daemons log messages in the spool directories.
- Sun Grid Engine queue master:
- Sun Grid Engine execution daemon:
$SGE_ROOT/default/spool/exec-hostname/messagesTo Verify Your Administrative Hosts# qconf -shTo Add Administrative Hostsqconf -ah hostnameTo Obtain Current Statusqstat -f
Note - In the status display, BIP means that queue permits batch, interactive, and parallel jobs. Also, the status au means the execution host daemon (execd) is not successfully running and communicating with the qmaster process.
Softpanorama hot topic of the month
FAIR USE NOTICE This site contains copyrighted material the use of which has not always been specifically authorized by the copyright owner. We are making such material available in our efforts to advance understanding of environmental, political, human rights, economic, democracy, scientific, and social justice issues, etc. We believe this constitutes a 'fair use' of any such copyrighted material as provided for in section 107 of the US Copyright Law. In accordance with Title 17 U.S.C. Section 107, the material on this site is distributed without profit exclusivly for research and educational purposes. If you wish to use copyrighted material from this site for purposes of your own that go beyond 'fair use', you must obtain permission from the copyright owner.
ABUSE: IPs or network segments from which we detect a stream of probes might be blocked for no less then 90 days. Multiple types of probes increase this period.
Groupthink : Two Party System as Polyarchy : Corruption of Regulators : Bureaucracies : Understanding Micromanagers and Control Freaks : Toxic Managers : Harvard Mafia : Diplomatic Communication : Surviving a Bad Performance Review : Insufficient Retirement Funds as Immanent Problem of Neoliberal Regime : PseudoScience : Who Rules America : Neoliberalism : The Iron Law of Oligarchy : Libertarian Philosophy
War and Peace : Skeptical Finance : John Kenneth Galbraith :Talleyrand : Oscar Wilde : Otto Von Bismarck : Keynes : George Carlin : Skeptics : Propaganda : SE quotes : Language Design and Programming Quotes : Random IT-related quotes : Somerset Maugham : Marcus Aurelius : Kurt Vonnegut : Eric Hoffer : Winston Churchill : Napoleon Bonaparte : Ambrose Bierce : Bernard Shaw : Mark Twain Quotes
Vol 25, No.12 (December, 2013) Rational Fools vs. Efficient Crooks The efficient markets hypothesis : Political Skeptic Bulletin, 2013 : Unemployment Bulletin, 2010 : Vol 23, No.10 (October, 2011) An observation about corporate security departments : Slightly Skeptical Euromaydan Chronicles, June 2014 : Greenspan legacy bulletin, 2008 : Vol 25, No.10 (October, 2013) Cryptolocker Trojan (Win32/Crilock.A) : Vol 25, No.08 (August, 2013) Cloud providers as intelligence collection hubs : Financial Humor Bulletin, 2010 : Inequality Bulletin, 2009 : Financial Humor Bulletin, 2008 : Copyleft Problems Bulletin, 2004 : Financial Humor Bulletin, 2011 : Energy Bulletin, 2010 : Malware Protection Bulletin, 2010 : Vol 26, No.1 (January, 2013) Object-Oriented Cult : Political Skeptic Bulletin, 2011 : Vol 23, No.11 (November, 2011) Softpanorama classification of sysadmin horror stories : Vol 25, No.05 (May, 2013) Corporate bullshit as a communication method : Vol 25, No.06 (June, 2013) A Note on the Relationship of Brooks Law and Conway Law
Fifty glorious years (1950-2000): the triumph of the US computer engineering : Donald Knuth : TAoCP and its Influence of Computer Science : Richard Stallman : Linus Torvalds : Larry Wall : John K. Ousterhout : CTSS : Multix OS Unix History : Unix shell history : VI editor : History of pipes concept : Solaris : MS DOS : Programming Languages History : PL/1 : Simula 67 : C : History of GCC development : Scripting Languages : Perl history : OS History : Mail : DNS : SSH : CPU Instruction Sets : SPARC systems 1987-2006 : Norton Commander : Norton Utilities : Norton Ghost : Frontpage history : Malware Defense History : GNU Screen : OSS early history
The Peter Principle : Parkinson Law : 1984 : The Mythical Man-Month : How to Solve It by George Polya : The Art of Computer Programming : The Elements of Programming Style : The Unix Haterís Handbook : The Jargon file : The True Believer : Programming Pearls : The Good Soldier Svejk : The Power Elite
Most popular humor pages:
Manifest of the Softpanorama IT Slacker Society : Ten Commandments of the IT Slackers Society : Computer Humor Collection : BSD Logo Story : The Cuckoo's Egg : IT Slang : C++ Humor : ARE YOU A BBS ADDICT? : The Perl Purity Test : Object oriented programmers of all nations : Financial Humor : Financial Humor Bulletin, 2008 : Financial Humor Bulletin, 2010 : The Most Comprehensive Collection of Editor-related Humor : Programming Language Humor : Goldman Sachs related humor : Greenspan humor : C Humor : Scripting Humor : Real Programmers Humor : Web Humor : GPL-related Humor : OFM Humor : Politically Incorrect Humor : IDS Humor : "Linux Sucks" Humor : Russian Musical Humor : Best Russian Programmer Humor : Microsoft plans to buy Catholic Church : Richard Stallman Related Humor : Admin Humor : Perl-related Humor : Linus Torvalds Related humor : PseudoScience Related Humor : Networking Humor : Shell Humor : Financial Humor Bulletin, 2011 : Financial Humor Bulletin, 2012 : Financial Humor Bulletin, 2013 : Java Humor : Software Engineering Humor : Sun Solaris Related Humor : Education Humor : IBM Humor : Assembler-related Humor : VIM Humor : Computer Viruses Humor : Bright tomorrow is rescheduled to a day after tomorrow : Classic Computer Humor
The Last but not Least
Copyright © 1996-2016 by Dr. Nikolai Bezroukov. www.softpanorama.org was created as a service to the UN Sustainable Development Networking Programme (SDNP) in the author free time. This document is an industrial compilation designed and created exclusively for educational use and is distributed under the Softpanorama Content License.
Original materials copyright belong to respective owners. Quotes are made for educational purposes only in compliance with the fair use doctrine.
FAIR USE NOTICE This site contains copyrighted material the use of which has not always been specifically authorized by the copyright owner. We are making such material available to advance understanding of computer science, IT technology, economic, scientific, and social issues. We believe this constitutes a 'fair use' of any such copyrighted material as provided by section 107 of the US Copyright Law according to which such material can be distributed without profit exclusively for research and educational purposes.
This is a Spartan WHYFF (We Help You For Free) site written by people for whom English is not a native language. Grammar and spelling errors should be expected. The site contain some broken links as it develops like a living tree...
|You can use PayPal to make a contribution, supporting development of this site and speed up access. In case softpanorama.org is down you can use the at softpanorama.info|
The statements, views and opinions presented on this web page are those of the author (or referenced source) and are not endorsed by, nor do they necessarily reflect, the opinions of the author present and former employers, SDNP or any other organization the author may be associated with. We do not warrant the correctness of the information provided or its fitness for any purpose.
Last modified: October 01, 2017