Softpanorama

May the source be with you, but remember the KISS principle ;-)
Home Switchboard Unix Administration Red Hat TCP/IP Networks Neoliberalism Toxic Managers
(slightly skeptical) Educational society promoting "Back to basics" movement against IT overcomplexity and  bastardization of classic Unix

Installation of the Grid Engine Execution Host

News Enterprise Unix System Administration Recommended Links Installation Planning Usage of NFS Installation of the Master Host
SGE cheat sheet Installation of SGE on CentOS 7 qconf qsub qalter qstat
Starting and Killing SGE Daemons SGE Queues Configuring Hosts From the Command Line SGE Submit Scripts Humor Etc

Introduction

We will assume that installation is performed on RHEL 6.5 or 6.6.  We also assume that NFS is used for sharing files with master host.

Degree of sharing is not that important but generally $SGE_ROOT/$SGE_CELL should be shared.  Efficiency consideration that are sited by many are overblown and without careful measurements and determining real bottleneck you might fall into classical trap called "premature optimization". And as Donald Knuth used to say "Premature optimization is the source of all evil". and long before him Talleyrand gave the following advice to young diplomats: "First and foremost, not too too much zeal".  Just substitute "young diplomats" for novice SGE administrators.

The same issue applies to a choice between classic spooling vs. Berkeley DB. Without measurements the selection of Berkeley DB is fools gold.

Before installing an execution host, you first need to install and configure the master.  Master should be running for installation of an execution host.

The installation process is pretty simple:

That's why you need to register daemon with chkconfig and start it manually. If you share the whole $SGE_ROOT tree of $SGE_ROOT/$SGE_CELL subtree via NFS, there is no need to to run thee execution daemon setup script, although this is the less consuming part of the installation. Here is how Dave love explained the situation:

== Set Up ==

After installing the distribution into place you need to run the installation script `inst_sge` (or a wrapper) on your master host, and on all execution hosts which don't share the installation via a distributed file system. This will configure and start up the daemons.

If the installation is shared with the execution hosts, using a shared, writable spool directory, it isn't necessary to do an explicit install on them. Use `inst_sge` to install the execd on the master host and copy to or share with the hosts the `sgeexecd` init script.

Then start this after the qmaster has been configured with the relevant host names (with `qconf -ah`).

You may or may not want to keep the execd on the master host for maintaining global load values, for instance, but you probably want to ensure it has zero slots, so as not to be able to run jobs.

You can install as many execution hosts as you want  using GUI installer:  just add as many hosts as you wish and they will be installed one by one in one batch. 

Pre-install

Red Hat installation checklist

On execution host you need to check the following four preconditions:

  1. Register and patch the server. You might need access to repository for the installation. I am not sure that necessary RPMs are present of RHEL installation DVD.

  2. Update or install java Usually java is already installed. But you still need to verify that. In case it is not, you need to install it.

  3. Install required RPMs, or tar files. This depends on you particular SGE version. See for example
  4. Configure NTP. Check using the command: ntpdate -u ntp1.firm.com

In addition you need to perform several steps to ensure proper environment for the execd.

  1. For installation of execution host you need a running Grid Engine qmaster.

  2. It is also necessary that target host is made an administrative host at least for the period of installation. You can verify your current list of administrative hosts with the command:
    qconf -sh

    You can add an administrative host with the command:

    qconf -ah <hostname>
  3. Important Check if common directory from the master host or NFS server is shared via NFS
  4. Create passwordless login from the master host to execution host
  5. Check GID and UID of all users. Check that user IDs are identical with master host. Start with sgeadmin user as if this user ID is created by RPM, it well can wrong UID and GID.
  6. Check the presence of SGE services to /etc/services
  7. Verify that proper directory (for example $SGE_ROOT or $SGE_ROOT/$SGE_CELL) is NFS mounted
  8. Register new execution host of the master host

Check if common directory from the master host or NFS server is shared via NFS

If you RPM-based installation then mounting NSF directory is a pre-requisite for installing RPMs and as such it is already met. In case of tar files based installation you need to do it now, if this is the first execution host in the cluster.

First you need to decide how much of $SGE_ROOT tree you want to share. For small installations the master host can also serve as NFS server. But of cause you can get better result using specialized server. Select how much you need to share. Most simple SGE installation for small clusters share either

For RPM-based installation the second method is preferable as you already installed binaries on the host: it make no sense to put executable on NSF so minimum is $SGE_ROOT/$SGE_CELL/common directory. Of course, in this case, you need install executables on each execution host. But as installing prerequisite RPMs is enough trouble (and you need to do it in any case), so two more RPMs does not make much difference.

Generally for small installations (say, less then 32 nodes and 640 cores) that do not have huge load of small (several minutes length) jobs, there is not much difference whether you share $SGE_ROOT or $SGE_ROOT/default. In this particular case sharing less does not really improve efficiency on modern servers. See Usage of NFS in Grid Engine.

Create the directory for shared files (for example /opt/sge/default) and put an appropriate line in /etc/fstab file.

export $SGE_ROOT=/opt/SGE # sge installation root directory.
export $SGE_DELL=default # cell directory
$SGE_MASTER=qmaster # hostname of your SGE master host. Should be in /etc/hosts
SGE_NFS_SHARE=$SGE_ROOT/$SGE_CELL # specify how much you share
mkdir -p $SGE_NFS_SHARE
echo "$SGE_MASTER:$SGE_NFS_SHARE $SGE_NFS_SHARE  nfs     vers=3,rw,hard,intr,tcp,rsize=32768,wsize=32768        1 2"" >> /etc/fstab
mount $SGE_NFS_SHARE

Now you need to add the host to the /etc/exports file on the master host or NSF server if you share by host, not my netmask. For example:

/opt/sge 10.194.186.254(rw,no_root_squash) 10.194.181.26(rw,no_root_squash) 

After that restart the NFS daemon so that NFS daemon reread export file:

# service nfs restart
Shutting down NFS mountd:                                  [  OK  ]
Shutting down NFS daemon:                                  [  OK  ]
Shutting down NFS quotas:                                  [  OK  ]
Shutting down NFS services:                                [  OK  ]
Starting NFS services:                                     [  OK  ]
Starting NFS quotas:                                       [  OK  ]
Starting NFS daemon:                                       [  OK  ]
Starting NFS mountd:                                       [  OK  ]

Check and, if necessary, create passwordless login from the master host to this execution host

If this was not done before, universally for all execution hosts, you need to create passwordless login environment now. This might be the case when this is the first execution host you are installing. In all other cases ssh certificates were already generated, you need just copy /root/.ssh/authorized_hosts for any working execution host.

Tip: If you already have any execution host already configured , just copy file authorized_hosts from already configured execution host.

cd /root/.ssh
scp sge01:/root/.ssh/authorized_hosts  . 
Check ssh access from the master host to the node on which you install the execution host (b5 in the example below):
[0]root@m17: # ssh b5
The authenticity of host 'b5 (10.194.181.46)' can't be established.
RSA key fingerprint is 18:35:6e:96:11:77:27:fc:ac:1c:8e:46:36:2b:ae:2b.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'b5,10.194.181.46' (RSA) to the list of known hosts.
Last login: Thu Jul 26 08:29:41 2012 from sge_master.firma.net

Check GID and UID of all users

Check if sgeadmin user is present if you are using it and it has GID and UID identical to the sgeadmin account on the master host. Execution host installer does not check that and at the end your execution host will not be able to communicate with the master and you might do not know what to do, as you missed this step and forget about this tricky error. Typically ROM-based installations install this user as a part of RPM install, but tar-based installation do not and. if you want to use it, you need to create it manually.

Check if sgeadmin user is present if you are using it and it has GID ans UID identical to the sgeadmin account on the master host. Execution host installer does not check that and at the end your execution host will not be able to communicate with the master and you might do not know what to do, as you missed this step and forget about this tricky error.

You also need to verify that UID and GUI of all users and application accounts are identical to the master host. There can be tricky errors, especially if sgeadmin user UID and GID are different or the user id for important application does not match.

If this is a reinstallation make sure that old version of execd daemon is not running and new version of qmaster daemon is running. Also remove star files from /etc/init.d.

Notes:

Check the presence of SGE services to /etc/services

Check /etc/services. If you installed SGE from RPMs, such as Son of Grid engine RPMs, this typically is added by RPMs. If necessary, manually add ports that you used during configuration of SGE master. For example:

sge_qmaster     6444/tcp                # Grid Engine Qmaster Service
sge_qmaster     6444/udp                # Grid Engine Qmaster Service
sge_execd       6445/tcp                # Grid Engine Execution Service
sge_execd       6445/udp                # Grid Engine Execution Service

Verify that proper directory (for example $SGE_ROOT or $SGE_ROOT/$SGE_CELL) is NFS mounted

On execution host and master host: verify the $SGE_ROOT and $SGE_CELL variables are set properly host: If the $SGE_ROOT environment variable is not set on execution host, set it by typing:

# SGE_ROOT=/opt/sge; export SGE_ROOT

Always manually confirm that you have set the $SGE_ROOT environment variable correctly. Type:

# echo $SGE_ROOT

Check if files are visible in the NFS-mounted directory and owned by iether root or sgeadmin (depending on what you chose during installation of qmaster)

cd $SGE_ROOT/$SGE_CELL && ls -l
	

Register new execution host on the master host

On the master host:

  1. Add host IP to the /etc/hosts (RHEL puts long name as host name which is not very convenient for SGE purposes)
  2. On the master host: add the host to the list of execution hosts (this is not strictly necessary)

    qconf -ae

    The -ae option (add execution host) displays an editor that contains a configuration template for an execution host. The editor is either the default vi editor or the editor that corresponds to the EDITOR environment variable.

    In this template you specify the hostname, which should be the name of an execution host we wnat to configure. In VI screen change the name and save the template. See the host_conf(5) man page for a detailed description of the template entries to be changed.

      1 hostname              template
      2 load_scaling          NONE
      3 complex_values        NONE
      4 user_lists            NONE
      5 xuser_lists           NONE
      6 projects              NONE
      7 xprojects             NONE
      8 usage_scaling         NONE
      9 report_variables      NONE

Correct bugs

Depending on your distribution you might need to correct a couple of bugs.

  1. Often execution host installer does not install if master host RPM was not installed too. It complains about missing files. In this case just do it.
  2. There can be mismatch of sgeadmin UID and GID on the master host and this execution host. You need to check and correct this too.

Run the installer

There are two installers for Grid engine:

On open source distribution GUI-based installer rarely works, and sometimes is not even shipped with the distributions, so you are forced using command line installer. On commercial distributions GUI installer typically works.

One advantage of GUI based installer is that it allow you to install multiple execution hosts at once. You can achieve the same affect with command line installer by using Expect.

If you have both you have a luxury to decide which installation method is best for you ( GUI installer, is pretty nice). Some considerations.

For detailed, step-by-step instruction see

Post install operations on the execution host

On the execution host: You need to ensure two things:

Notes:

Add the host to the necessary queue

On the master host: Specify a queue for this host. That can be done by either adding it to existing queue or copying existing queue, renaming it and saving under new name.

To add a new queue using existing queue as a template use commands

  1. If you want to create a unique queue for testing this host copy similar queue for the other host
    # qconf -sq c32.q > b2.q 
  2. Change in the template four parameters
    hostlist              lus 
    processors            32
    slots                 32
    shell                 /bin/bash
    pe_list               ms 
    vi m40a.q 
  3. Write back from the file under different name
    qconf -Aq b2.q 
    root@lus17 added "b2.q" to cluster queue list

See Creating and modifying SGE Queues

  1. Verify that the execution host has been declared with the command

    qconf -sel

    which lists all execution hosts.

    You can also use qconf -se <hostname> to see parameters configured (usually only hostname is configured) See Configuring Hosts From the Command Line

Reboot and check if the environment is correct after reboot and that execd daemon starts properly

On the execution host:

Grid Engine messages

Grid Engine messages can be found in syslog during startup:

After startup the daemons log their messages in their spool directories.

Qmaster:

$SGE_ROOT/$SGE_DELL/spool/qmaster/messages

Exec daemon:

$SGE_ROOT/$SGE_DELL/spool/$EXEC_HOSTNAME/messages

Where $EXEC_HOSTNAME is hostname of execution host we want to see the messages from.


Top Visited
Switchboard
Latest
Past week
Past month

NEWS CONTENTS

Old News ;-)

[Nov 08, 2018] SGE Installation on Centos 7

Nov 08, 2018 | liv.ac.uk

I installed SGE on Centos 7 back in January this year. If my recolection is correct, the procedure was analogous to the instructions for Centos 6. There were some issues with the firewalld service (make sure that it is not blocking SGE), as well as some issues with SSL.
Check out these threads for reference:

http://arc.liv.ac.uk/pipermail/sge-discuss/2017-January/001050.html

Max

[Aug 17, 2018] Rocks 7.0 Manzanita (CentOS 7.4)

Aug 17, 2018 | www.rocksclusters.org

Operating System Base

Rocks 7.0 (Manzanita) x86_64 is based upon CentOS 7.4 with all updates available as of 1 Dec 2017.

Building a bare-bones compute cluster Building a more complex cluster

In addition to above, select the following rolls:

  • area51
  • fingerprint
  • ganglia
  • kvm (used for virtualization)
  • hpc
  • htcondor (used independently or in conjunction with sge)
  • perl
  • python
  • sge
  • zfs-linux (used to build reliable storage systems)
Building Custom Clusters

If you wish to build a custom cluster, you must choose from our a la carte selection, but make sure to download the required base , kernel and both CentOS rolls. The CentOS rolls include CentOS 7.4 w/updates pre-applied. Most users will want the full updated OS so that other software can be added.

MD5 Checksums

Please double check the MD5 checksums for all the rolls you download.

Downloads

All ISOs are available for downloads from here . Individual links are listed below.

Name Description Name Description
kernel Rocks Bootable Kernel Roll required zfs-linux ZFS On Linux Roll. Build and Manage Multi Terabyte File Systems.
base Rocks Base Roll required fingerprint Fingerprint application dependencies
core Core Roll required hpc Rocks HPC Roll
CentOS CentOS Roll required htcondor HTCondor High Throughput Computing (version 8.2.8)
Updates-CentOS CentOS Updates Roll required sge Sun Grid Engine (Open Grid Scheduler) job queueing system
kvm Support for building KVM VMs on cluster nodes perl Support for Newer Version of Perl
ganglia Cluster monitoring system from UCB python Python 2.7 and Python 3.x
area51 System security related services and utilities openvswitch Rocks integration of OpenVswitch

[Aug 17, 2018] Installation of Son of Grid Engine(SGE) on CentOS7 by byeon iksu

Oct 15, 2017 | biohpc.blogspot.com

Installation of Son of Grid Engine(SGE) on CentOS7

SGE Master installation

master# hostnamectl set-hostname qmaster.local

master# vi /etc/hosts
192.168.56.101 qmaster.local qmaster
192.168.56.102 compute01.local compute01

master# mkdir -p /BiO/src
master# yum -y install epel-release
master# yum -y install jemalloc-devel openssl-devel ncurses-devel pam-devel libXmu-devel hwloc-devel hwloc hwloc-libs java-devel javacc ant-junit libdb-devel motif-devel csh ksh xterm db4-utils perl-XML-Simple perl-Env xorg-x11-fonts-ISO8859-1-100dpi xorg-x11-fonts-ISO8859-1-75dpi
master# groupadd -g 490 sgeadmin
master# useradd -u 495 -g 490 -r -m -d /home/sgeadmin -s /bin/bash -c "SGE Admin" sgeadmin
master# visudo
%sgeadmin ALL=(ALL) NOPASSWD: ALL
master# cd /BiO/src
master# wget http://arc.liv.ac.uk/downloads/SGE/releases/8.1.9/sge-8.1.9.tar.gz
master# tar zxvfp sge-8.1.9.tar.gz
master# cd sge-8.1.9/source/
master# sh scripts/bootstrap.sh && ./aimk && ./aimk -man
master# export SGE_ROOT=/BiO/gridengine && mkdir $SGE_ROOT
master# echo Y | ./scripts/distinst -local -allall -libs -noexit
master# chown -R sgeadmin.sgeadmin /BiO/gridengine

master# cd $SGE_ROOT
master# ./install_qmaster
press enter at the intro screen
press "y" and then specify sgeadmin as the user id
leave the install dir as /BiO/gridengine
You will now be asked about port configuration for the master, normally you would choose the default (2) which uses the /etc/services file
accept the sge_qmaster info
You will now be asked about port configuration for the master, normally you would choose the default (2) which uses the /etc/services file
accept the sge_execd info
leave the cell name as "default"
Enter an appropriate cluster name when requested
leave the spool dir as is
press "n" for no windows hosts!
press "y" (permissions are set correctly)
press "y" for all hosts in one domain
If you have Java available on your Qmaster and wish to use SGE Inspect or SDM then enable the JMX MBean server and provide the requested information - probably answer "n" at this point!
press enter to accept the directory creation notification
enter "classic" for classic spooling (berkeleydb may be more appropriate for large clusters)
press enter to accept the next notice
enter "20000-20100" as the GID range (increase this range if you have execution nodes capable of running more than 100 concurrent jobs)
accept the default spool dir or specify a different folder (for example if you wish to use a shared or local folder outside of SGE_ROOT
enter an email address that will be sent problem reports
press "n" to refuse to change the parameters you have just configured
press enter to accept the next notice
press "y" to install the startup scripts
press enter twice to confirm the following messages
press "n" for a file with a list of hosts
enter the names of your hosts who will be able to administer and submit jobs (enter alone to finish adding hosts)
skip shadow hosts for now (press "n")
choose "1" for normal configuration and agree with "y"
press enter to accept the next message and "n" to refuse to see the previous screen again and then finally enter to exit the installer

master# cp /BiO/gridengine/default/common/settings.sh /etc/profile.d/
master# qconf -ah compute01.local
compute01.local added to administrative host list

master# yum -y install nfs-utils
master# vi /etc/exports
/BiO 192.168.56.0/24(rw,no_root_squash)

master# systemctl start rpcbind nfs-server
master# systemctl enable rpcbind nfs-server

SGE Client installation

compute01# yum -y install hwloc-devel
compute01# hostnamectl set-hostname compute01.local
compute01# vi /etc/hosts
192.168.56.101 qmaster.local qmaster
192.168.56.102 compute01.local compute01

compute01# groupadd -g 490 sgeadmin
compute01# useradd -u 495 -g 490 -r -m -d /home/sgeadmin -s /bin/bash -c "SGE Admin" sgeadmin

compute01# yum -y install nfs-utils
compute01# systemctl start rpcbind
compute01# systemctl enable rpcbind
compute01# mkdir /BiO
compute01# mount -t nfs 192.168.56.101:/BiO /BiO
compute01# vi /etc/fstab
192.168.56.101:/BiO /BiO nfs defaults 0 0

compute01# export SGE_ROOT=/BiO/gridengine
compute01# export SGE_CELL=default
compute01# cd $SGE_ROOT
compute01# ./install_execd
compute01# cp /BiO/gridengine/default/common/settings.sh /etc/profile.d/

[Mar 02, 2016] Son of Grid engine version 8.1.9 is availble

Mar 02, 2016 | liv.ac.uk

README

This is Son of Grid Engine version v8.1.9.

See <http://arc.liv.ac.uk/repos/darcs/sge-release/NEWS> for information on recent changes. See <https://arc.liv.ac.uk/trac/SGE> for more information.

The .deb and .rpm packages and the source tarball are signed with PGP key B5AEEEA9.

* sge-8.1.9.tar.gz, sge-8.1.9.tar.gz.sig:  Source tarball and PGP signature

* RPMs for Red Hat-ish systems, installing into /opt/sge with GUI
  installer and Hadoop support:

  * gridengine-8.1.9-1.el5.src.rpm:  Source RPM for RHEL, Fedora

  * gridengine-*8.1.9-1.el6.x86_64.rpm:  RPMs for RHEL 6 (and
    CentOS, SL)

  See < https://copr.fedorainfracloud.org/coprs/loveshack/SGE/ > for
  hwloc 1.6 RPMs if you need them for building/installing RHEL5 RPMs.

* Debian packages, installing into /opt/sge, not providing the GUI
  installer or Hadoop support:

  * sge_8.1.9.dsc, sge_8.1.9.tar.gz:  Source packaging.  See
    <http://wiki.debian.org/BuildingAPackage> , and see
    < http://arc.liv.ac.uk/downloads/SGE/support/  > if you need (a more
    recent) hwloc.

  * sge-common_8.1.9_all.deb, sge-doc_8.1.9_all.deb,
    sge_8.1.9_amd64.deb, sge-dbg_8.1.9_amd64.deb: Binary packages
    built on Debian Jessie.

* debian-8.1.9.tar.gz:  Alternative Debian packaging, for installing
  into /usr.

* arco-8.1.6.tar.gz:  ARCo source (unchanged from previous version)

* dbwriter-8.1.6.tar.gz:  compiled dbwriter component of ARCo
  (unchanged from previous version)

More RPMs (unsigned, unfortunately) are available at < http://copr.fedoraproject.org/coprs/loveshack/SGE/ >.

Briefly: Adding a new node to SGE

I've done this a couple of times by now, and I always forget one step or another. Most of the information is on http://verahill.blogspot.com.au/2012/06/setting-up-sun-grid-engine-with-three.html but here it is in a briefer form:

In the example I've used krypton as the node name, and 192.168.1.180 as the IP.
My front node is called beryllium and has an IP of 192.168.1.1.

0. On the front node
Add the new node name to the front node/queue master

Add execution host

qconf -ae 
which opens a text file in vim

Edited hostname (krypton) but nothing else. Saving returns

added host krypton to exec host list
Add krypton as a submit host
qconf -as krypton
krypton added to submit host list
Doing this before touching the node makes life a little bit easier.

1. Edit /etc/hosts on the node

Leave

	127.0.0.1 localhost
but remove
127.0.1.1 krypton
and make sure that it says
192.168.1.180 krypton
instead.

Throw in

192.168.1.1 beryllium
as well.

2. Install SGE on node

sudo apt-get install gridengine-exec gridengine-client
You'll be asked about
Configure automatically: yes Cell name: rupert Master hostname: beryllium
3. Add node to queue and group

I maintain separate queues and groups depending on how many cores each node has. See e.g. http://verahill.blogspot.com.au/2012/06/setting-up-sun-grid-engine-with-three.html for how to create queues and groups.

If they already exits, just do

qconf -aattr hostgroup hostlist krypton @fourcores
qconf -aattr queue slots "[krypton=4]" fourcores.q
to add the new node.

4. Add pe to queue if necessary

Since I have different queues depending on the number of cores of a node, I tend to have to fiddle with this.

See e.g. http://verahill.blogspot.com.au/2012/06/setting-up-sun-grid-engine-with-three.html for how to create pe:s.

If the pe you need is already created, you can do

qconf -mq fourcores.q
and edit pe_list

5. Check

On the front node, do
qhost
	HOSTNAME ARCH NCPU LOAD MEMTOT MEMUSE SWAPTO SWAPUS ------------------------------------------------------------------------------- 
	global - - - - - - - beryllium lx26-amd64 3 0.16 7.8G 5.3G 14.9G 398.2M boron lx26-amd64 6 6.02 
	7.6G 1.6G 14.9G 0.0 helium lx26-amd64 2 - 2.0G - 1.9G - lithium lx26-amd64 3 - 3.9G - 0.0 - neon 
	lx26-amd64 8 8.01 31.4G 1.3G 59.6G 0.0 krypton lx26-amd64 4 4.01 15.6G 2.8G 14.9G 0.0
 

Installing the Sun Grid Engine Software

This section describes installing the Sun Grid Engine software. These instructions are streamlined for installations particular to the Sun Shared Visualization 1.1 software.

Complete Sun Grid Engine documentation, including an installation guide, is available at:

http://docs.sun.com/app/docs/coll/1017.3

To Prepare to Install the Sun Grid Engine Software

This procedure is for installations on all Solaris and Linux servers.

1. Determine which host is to be the queue master (qmaster) and which host is to be the NFS server for your grid.

If the resources are available, the same host can perform both roles.

2. Determine which hosts are to be the execution hosts for your grid.

If the resources are available and these systems are configured with graphics accelerators, the execution hosts can also be the graphics servers.


Note - Execution hosts need the korn shell, ksh. Solaris hosts include ksh by default, but Linux hosts might need ksh to be installed.

3. Determine your installation root directory.

The package default is /gridware/sge, however the Sun Grid Engine documentation calls this <sge_root> or /sge_root. These instructions use the variable, $SGE_ROOT.

4. Become superuser of the NFS server and declare the variable:

# setenv SGE_ROOT /gridware/sge

If you chose a different installation root directory in Step 3, type that directory name instead of /gridware/sge.

5. Create the base directory for $SGE_ROOT if the path has multiple directory components:

# mkdir /gridware

6. Determine an SGE administrative login that can be used on all systems intended to be administration hosts.

For example, you might plan to use these parameters:


Parameter

Value

Name

sgeadmin

Group

adm (4)

Home directory

$SGE_ROOT
or /gridware/sge (if that is your SGE_ROOT choice)

User ID

530


The Sun Grid Engine administrator can have a different user ID than sgeadmin. However, the administrative user ID (530 in this example) must be available across all hosts in the grid.

On SuSE hosts, group 4 (adm) might not already be defined in /etc/group. In that case, you need to add that group.

7. Create the sgeadmin user on the NFS server for your grid.

Use the values you selected in Step 6, as in this example::

# useradd -u 530 -g 4 -d $SGE_ROOT -m -s /bin/tcsh -c "Sun Grid Engine Admin" sgeadmin

8. Assign the sgeadmin user a password:

# passwd sgeadmin

9. Append the following lines to the sgeadmin .cshrc file:

if ( $?prompt == 1 ) then
	if ( -f /gridware/sge/default/common/settings.csh ) then
           source /gridware/sge/default/common/settings.csh
	endif
endif

Replace /gridware/sge with the value of $SGE_ROOT if different.


Step 9 , as the variable will not be set in a fresh shell until the settings.csh file is sourced.

You might choose to do the same for root's .cshrc or .tcshrc, or the equivalent file for root's shell.

10. Continue the installation of software on the NFS server by performing one of these procedures:

To Install the Software on a Linux System

1. Permit $SGE_ROOT to be shared (exported) by the NFS server.

If your base directory of $SGE_ROOT is already shared, you do not need to perform this step.

On the Linux NFS server, append the following line to the /etc/exports file:

/gridware     *(rw,sync,no_root_squash)

where /gridware is the base directory of your $SGE_ROOT.

2. Inform the operating system of the changes you have made:

3. If the system automounts using the hosts map, you can test the accessibility of the $SGE_ROOT directory from other systems on the network with this command:

# ls /net/nfsserverhostname/$SGE_ROOT

4. From each server in the grid, access the NFS server's $SGE_ROOT as each server's $SGE_ROOT using /etc/vfstab, /etc/fstab, or automounting.


Note - Submit hosts (client machines) also need to mount the NFS server's $SGE_ROOT.

Execution hosts must not mount the NFS server with the nosuid option, as setuid is needed by Sun Grid Engine's rlogin and rsh for its qrsh command to work properly.


a. Add the following line to the /etc/fstab file:

nfsserverhostname:/gridware    /gridware  nfs        auto,suid,bg,intr          0 0

Your Linux system might also need the no_root_squash option in this line.

b. Type these two commands:

# mkdir /gridware
# mount /gridware

where /gridware is the base directory of your $SGE_ROOT.


Note - If you use NIS to resolve host names, add the server's name to the /etc/hosts file and ensure that files is in the hosts entry in the /etc/nsswitch.conf file. Mounting occurs before the NIS name service is started. The first hostname on the /etc/hosts line for the execution host itself should not include a domain.

5. Determine port numbers.

You must determine an available port on the qmaster system. Sun Grid Engine components will use this port to communicate with the qmaster daemon. This port must be a single port number that is available on all current or prospect submit and execution hosts in your grid.

These port numbers can be any value, but the following port numbers have been assigned by the Internet Assigned Number Authority (IANA):


Note - For more information about IANA, see:
http://www.iana.org/assignments/port-numbers

If you are running a firewall on any execution host, ensure that the execution daemon's port allows traffic in.

6. Communicate the port numbers to the hosts.

These port numbers can be communicated to the hosts involved either by inserting the port numbers into every host's /etc/inet/services or /etc/services file or by setting Sun Grid Engine environment variables. The latter method, detailed in Step 4 of To Complete the Software Installation, is more convenient, because each Sun Grid Engine user already needs to use a Sun Grid Engine environment setup file. If you allow Sun Grid Engine to use this setup file, you will not have to add sge entries into every host's services file.

To use this environment variable technique, set these environment variables before you invoke ./install_qmaster in Step 2 of To Complete the Software Installation. Use the port numbers determined in Step 5 in place of 6444 and 6445 in these commands:

# setenv SGE_QMASTER_PORT 6444
# setenv SGE_EXECD_PORT   6445

The lines you include in the setup file for Sun Grid Engine will be executed by Step 5 of To Complete the Software Installation. (After installation, you will need to ensure that the setup file's set and export environment variables are naming SGE_QMASTER_PORT and SGE_EXECD_PORT.)

7. As superuser of the NFS server, install the Sun Grid Engine packages into $SGE_ROOT.

The NGS server will need both Sun Grid Engine architecture-independent common files and architecture-dependent files for the architecture of every submit and execution host. (Each architecture is a pairing of processor instruction set and operating system.) You might also choose to install documentation files.

These files can be installed from RPM packages on a Linux system. Files for additional nonnative architectures need to be installed from tar bundles, which is explained in Step 1 in To Complete the Software Installation.

Refer to TABLE 3-2, which lists commonly used Sun Grid Engine 6.1 Linux software RPM packages and the download files that contain those packages. If you are installing a release other than Sun Grid Engine 6.1, the download file names will refer to that version instead of reading 6_1. Also, newer versions of Sun Grid Engine might use file names that say sge instead of n1ge.

TABLE 3-2 Sun Grid Engine 6.1 Linux Software RPM Packages

Application

RPM Package

Description

Common

sun-nlge-common-6.1.0.noarch.rpm

Sun Grid Engine architecture-independent common files, including documentation files

X64

sun-nlge-bin-linux24-x64-6.1.0.x86_64.rpm

Linux kernel 2.4 or 2.6, glibc >= 2.3.2, for AMD Opteron or Intel EM64T

X86

sun-nlge-bin-linux24-i586-6.1.0.i386.rpm

Linux kernel 2.4 or 2.6, glibc >= 2.3.2, for 32-bit x86

Common but Optional

sun-nlge-arco-6.1.0.noarch.rpm

Accounting and Reporting Console (ARCo) for all architectures, not needed for the core product (optional).


To install each of the RPM packages you selected, type an rpm command line such as this:

# rpm -iv /path-to-rpm-file/sun-nlge-rest-of-filename.rpm

8. Perform the steps in To Complete the Software Installation.

To Complete the Software Installation

This procedure is for installations on all Solaris and Linux servers.

1. Install additional Sun Grid Engine tar bundles of files needed by hosts with a different operating system than the NFS server.

TABLE 3-3 lists Sun Grid Engine 6.1 software tar bundles, which can install nonnative software on a Solaris or Linux NFS server. Use these bundles to install software on an NFS server as needed to support hosts with a different operating system. (Newer versions of Sun Grid Engine might use file names that say sge instead of n1ge.)

TABLE 3-3 Sun Grid Engine 6.1 Software tar Bundles

Name of tar File Bundle

Description

nlge-common.tar.gz

Architecture independent files (required, but was already installed from packages on the NFS server)

nlge-6_1-bin-linux24-amd64.tar.gz

Linux kernel 2.4 or 2.6, glibc >= 2.3.2, for AMD Opteron and Intel EM64T

nlge-6_1-bin-linux24-i586.tar.gz

Linux kernel 2.4 or 2.6, glibc >= 2.2.5, for 32-bit x86

nlge-6_1-bin-solaris-sparcv9.tar.gz

Solaris 8 and higher, for 64-bit SPARC

nlge-6_1-bin-solaris-i586.tar.gz

Solaris 9 and higher, for 32-bit x86

nlge-6_1-bin-solaris-x64.tar.gz

Solaris 10, for 64-bit x64 (such as AMD Opteron)

nlge-6_1-bin-windows-x86.tar.gz

Microsoft Windows[1]

nlge-6_1-arco.tar.gz

Accounting and Reporting Console (ARCo) for all architectures, not needed for the core product

swc_linux_2.2.5.tar.gz

Sun Web Console, required for ARCo, Linux, for 32-bit x86

swc_solx86_2.2.5.tar.gz

Sun Web Console, required for ARCo, Solaris, for x86

swc_sparc_2.2.5.tar.gz

Sun Web Console, required for ARCo, Solaris, for 64-bit SPARC


After you download the additional software you need, you can install the contents of each tar.gz file in the $SGE_ROOT directory with a command such as this:

# gunzip -c nlge-6_1-platform.tar.gz | (cd $SGE_ROOT; tar xf -)

If you installed any of the tar bundles mentioned in this step, you will need to answer n when the installation script asks (as in Step 3):

Did you install this version with >pkgadd< or did you already verify and set the file permissions of your distribution (enter: y)

2. On the queue master host, type:

# cd $SGE_ROOT ; ./install_qmaster

The Sun Grid Engine installation script begins.

3. The script prompts you for information and requests confirmation of selected values.

As you progress through the script, consider the following:


Note - The installation script might instead ask this question: "Do you want to install Grid Engine under a user id other than >root<? (y/n) [y]". Answer y. Later, you are asked for the user ID, which can be sgeadmin (as created in Step 6 of To Prepare to Install the Sun Grid Engine Software).

Execution hosts and the queue master must agree on the primary name of the execution host. If the execution host and the queue master do not agree on hostnames, a host_aliases file in the $SGE_ROOT directory enables SGE to understand that certain names are equivalent. For example, a host_aliases file might include this line:

myhost1 my1 myhost1-ib my1-ib

Every host name on this line is considered equivalent to the first name on the line (myhost1), which is the primary host name. For more details, see the Sun Grid Engine man page for host_aliases (5).

In addition, Sun Grid Engine requires that a host's unique hostname is associated with a true IP address, not the localhost address 127.0.0.1.

Or, ask your administrator for a reasonable range of unused group IDs. Sun Grid Engine uses the group IDs for each of the parallel jobs that are running at a given time.

4. Update environment variables in settings files.

If you decided to communicate the port numbers to all SGE hosts using SGE's environment setup file, you now need to assure that SGE sets the correct port numbers for environment variables SGE_QMASTER_PORT and SGE_EXECD_PORT. (You would have made that choice at Step 6 of To Install the Software on a Solaris System or Step 6 of To Install the Software on a Linux System, and would have determined the port numbers in the step before these steps.)

You might find that the proper variable values were written when you ran install_qmaster.

a. Edit the SGE settings file for csh or tcsh.

The file is $SGE_ROOT/default/common/settings.csh.

b. In the settings.csh file, look for lines such as these:

unsetenv SGE_QMASTER_PORT
unsetenv SGE_EXECD_PORT

If you find such lines, change them to use your port numbers.

You determined the port numbers in Step 5 of To Install the Software on a Solaris System or Step 5 of To Install the Software on a Linux System. For example, change the lines to the following:

setenv SGE_QMASTER_PORT 6444
setenv SGE_EXECD_PORT   6445

c. Edit the SGE settings file for sh, bash, and ksh.

The file is $SGE_ROOT/default/common/settings.sh

d. In the settings.sh file, look for lines such as these:

unset SGE_QMASTER_PORT
unset SGE_EXECD_PORT

If you find such lines, change them to use your port numbers.

For example, change the lines to the following:

SGE_QMASTER_PORT=6444;   export SGE_QMASTER_PORT
SGE_EXECD_PORT=6445;     export SGE_EXECD_PORT

The settings files contain the lines to unset these environment variables by default. This default behavior is desirable if you if had instead decided to enter the port numbers in every SGE host's /etc/services or /etc/inet/services file.

5. Source the file to set up your environment to use Sun Grid Engine.

Substitute /gridware/sge with your value of $SGE_ROOT. Consider having root's .login do so.

Substitute /gridware/sge with the $SGE_ROOT. Consider having root's .profile or .bashrc do so.

6. Create the sgeadmin user on each of the other administration hosts of the grid:

# useradd -u 530 -g 4 -d $SGE_ROOT -s /bin/tcsh -c "Sun Grid Engine Admin" sgeadmin

Note - Unlike Step 7 of To Prepare to Install the Sun Grid Engine Software, the -m option is not needed for these other administration hosts. Assign the sgeadmin a password, as in Step 8 of that procedure.

Alternatively, you can add the sgeadmin entries to the respective /etc/passwd and /etc/shadow files.

7. As superuser on every execution host, set the SGE_ROOT environment variable and then type:

# cd $SGE_ROOT ; ./install_execd

You might need to create the execution host's default spooling directory. As superuser on the NFS server, type:

# mkdir $SGE_ROOT/default/spool/exec-hostname

The same value for exec-hostname is needed in the procedure To Set Up Sun Grid Engine Environment Variables

8. After the environment is set up, submit a test job.

To specify the job to execute on your host:

exechost% qsub -q all.q@'hostname' $SGE_ROOT/examples/jobs/simple.sh
exechost% qstat -f

Job output and errors are in the initiating user's home directory, with filenames similar to the following:

simple.sh.e1			simple.sh.o1

Note - If you run the job as root, these files are in the execution host's root directory. If you do not know which host executed the job, you do not know which root directory the files are in. Therefore, submit jobs as a user whose home directory is in one place irrespective of execution host or specify the execution hostname explicitly.
To Set Up Sun Grid Engine Environment Variables

Use one of the following commands:

Substitute /gridware/sge with your $SGE_ROOT.

Substitute /gridware/sge with your $SGE_ROOT.


Note - These commands add $SGE_ROOT/bin/$ARCH to $path, add $SGE_ROOT/man to $MANPATH, set $SGE_ROOT, and if needed set $SGE_CELL to $COMMD_PORT.

Messages from Sun Grid Engine can be found in:

After the startup the daemons log messages in the spool directories.

$SGE_ROOT/default/spool/qmaster/messages

$SGE_ROOT/default/spool/exec-hostname/messages

To Verify Your Administrative Hosts
# qconf -sh
To Add Administrative Hosts
qconf -ah hostname
To Obtain Current Status
qstat -f

Note - In the status display, BIP means that queue permits batch, interactive, and parallel jobs. Also, the status au means the execution host daemon (execd) is not successfully running and communicating with the qmaster process.

Recommended Links

Google matched content

Softpanorama Recommended

Top articles

Sites

Top articles

Sites

Internal

External



Etc

Society

Groupthink : Two Party System as Polyarchy : Corruption of Regulators : Bureaucracies : Understanding Micromanagers and Control Freaks : Toxic Managers :   Harvard Mafia : Diplomatic Communication : Surviving a Bad Performance Review : Insufficient Retirement Funds as Immanent Problem of Neoliberal Regime : PseudoScience : Who Rules America : Neoliberalism  : The Iron Law of Oligarchy : Libertarian Philosophy

Quotes

War and Peace : Skeptical Finance : John Kenneth Galbraith :Talleyrand : Oscar Wilde : Otto Von Bismarck : Keynes : George Carlin : Skeptics : Propaganda  : SE quotes : Language Design and Programming Quotes : Random IT-related quotesSomerset Maugham : Marcus Aurelius : Kurt Vonnegut : Eric Hoffer : Winston Churchill : Napoleon Bonaparte : Ambrose BierceBernard Shaw : Mark Twain Quotes

Bulletin:

Vol 25, No.12 (December, 2013) Rational Fools vs. Efficient Crooks The efficient markets hypothesis : Political Skeptic Bulletin, 2013 : Unemployment Bulletin, 2010 :  Vol 23, No.10 (October, 2011) An observation about corporate security departments : Slightly Skeptical Euromaydan Chronicles, June 2014 : Greenspan legacy bulletin, 2008 : Vol 25, No.10 (October, 2013) Cryptolocker Trojan (Win32/Crilock.A) : Vol 25, No.08 (August, 2013) Cloud providers as intelligence collection hubs : Financial Humor Bulletin, 2010 : Inequality Bulletin, 2009 : Financial Humor Bulletin, 2008 : Copyleft Problems Bulletin, 2004 : Financial Humor Bulletin, 2011 : Energy Bulletin, 2010 : Malware Protection Bulletin, 2010 : Vol 26, No.1 (January, 2013) Object-Oriented Cult : Political Skeptic Bulletin, 2011 : Vol 23, No.11 (November, 2011) Softpanorama classification of sysadmin horror stories : Vol 25, No.05 (May, 2013) Corporate bullshit as a communication method  : Vol 25, No.06 (June, 2013) A Note on the Relationship of Brooks Law and Conway Law

History:

Fifty glorious years (1950-2000): the triumph of the US computer engineering : Donald Knuth : TAoCP and its Influence of Computer Science : Richard Stallman : Linus Torvalds  : Larry Wall  : John K. Ousterhout : CTSS : Multix OS Unix History : Unix shell history : VI editor : History of pipes concept : Solaris : MS DOSProgramming Languages History : PL/1 : Simula 67 : C : History of GCC developmentScripting Languages : Perl history   : OS History : Mail : DNS : SSH : CPU Instruction Sets : SPARC systems 1987-2006 : Norton Commander : Norton Utilities : Norton Ghost : Frontpage history : Malware Defense History : GNU Screen : OSS early history

Classic books:

The Peter Principle : Parkinson Law : 1984 : The Mythical Man-MonthHow to Solve It by George Polya : The Art of Computer Programming : The Elements of Programming Style : The Unix Hater’s Handbook : The Jargon file : The True Believer : Programming Pearls : The Good Soldier Svejk : The Power Elite

Most popular humor pages:

Manifest of the Softpanorama IT Slacker Society : Ten Commandments of the IT Slackers Society : Computer Humor Collection : BSD Logo Story : The Cuckoo's Egg : IT Slang : C++ Humor : ARE YOU A BBS ADDICT? : The Perl Purity Test : Object oriented programmers of all nations : Financial Humor : Financial Humor Bulletin, 2008 : Financial Humor Bulletin, 2010 : The Most Comprehensive Collection of Editor-related Humor : Programming Language Humor : Goldman Sachs related humor : Greenspan humor : C Humor : Scripting Humor : Real Programmers Humor : Web Humor : GPL-related Humor : OFM Humor : Politically Incorrect Humor : IDS Humor : "Linux Sucks" Humor : Russian Musical Humor : Best Russian Programmer Humor : Microsoft plans to buy Catholic Church : Richard Stallman Related Humor : Admin Humor : Perl-related Humor : Linus Torvalds Related humor : PseudoScience Related Humor : Networking Humor : Shell Humor : Financial Humor Bulletin, 2011 : Financial Humor Bulletin, 2012 : Financial Humor Bulletin, 2013 : Java Humor : Software Engineering Humor : Sun Solaris Related Humor : Education Humor : IBM Humor : Assembler-related Humor : VIM Humor : Computer Viruses Humor : Bright tomorrow is rescheduled to a day after tomorrow : Classic Computer Humor

The Last but not Least Technology is dominated by two types of people: those who understand what they do not manage and those who manage what they do not understand ~Archibald Putt. Ph.D


Copyright © 1996-2021 by Softpanorama Society. www.softpanorama.org was initially created as a service to the (now defunct) UN Sustainable Development Networking Programme (SDNP) without any remuneration. This document is an industrial compilation designed and created exclusively for educational use and is distributed under the Softpanorama Content License. Original materials copyright belong to respective owners. Quotes are made for educational purposes only in compliance with the fair use doctrine.

FAIR USE NOTICE This site contains copyrighted material the use of which has not always been specifically authorized by the copyright owner. We are making such material available to advance understanding of computer science, IT technology, economic, scientific, and social issues. We believe this constitutes a 'fair use' of any such copyrighted material as provided by section 107 of the US Copyright Law according to which such material can be distributed without profit exclusively for research and educational purposes.

This is a Spartan WHYFF (We Help You For Free) site written by people for whom English is not a native language. Grammar and spelling errors should be expected. The site contain some broken links as it develops like a living tree...

You can use PayPal to to buy a cup of coffee for authors of this site

Disclaimer:

The statements, views and opinions presented on this web page are those of the author (or referenced source) and are not endorsed by, nor do they necessarily reflect, the opinions of the Softpanorama society. We do not warrant the correctness of the information provided or its fitness for any purpose. The site uses AdSense so you need to be aware of Google privacy policy. You you do not want to be tracked by Google please disable Javascript for this site. This site is perfectly usable without Javascript.

Last modified: March 12, 2019