|
Home | Switchboard | Unix Administration | Red Hat | TCP/IP Networks | Neoliberalism | Toxic Managers |
(slightly skeptical) Educational society promoting "Back to basics" movement against IT overcomplexity and bastardization of classic Unix |
|
We will assume that installation is performed on RHEL 6.5 or 6.6. We also assume that NFS is used for sharing files with master host.
Degree of sharing is not that important but generally $SGE_ROOT/$SGE_CELL should be shared. Efficiency consideration that are sited by many are overblown and without careful measurements and determining real bottleneck you might fall into classical trap called "premature optimization". And as Donald Knuth used to say "Premature optimization is the source of all evil". and long before him Talleyrand gave the following advice to young diplomats: "First and foremost, not too too much zeal". Just substitute "young diplomats" for novice SGE administrators.
The same issue applies to a choice between classic spooling vs. Berkeley DB. Without measurements the selection of Berkeley DB is fools gold.
The are three dependencies that need to be installed so that installation can proceed with just standard RHEL repositories.
jemalloc x86_64 3.6.0-1.el6 epel
hwloc.x86_64 0:1.5-3.el6_5
======================================================================================================================== Package Arch Version Repository Size ======================================================================================================================== Installing: gridengine x86_64 8.1.8-1.el6 /gridengine-8.1.8-1.el6.x86_64 40 M perl-XML-Simple noarch 2.18-6.el6 /perl-XML-Simple-2.18-6.el6.noarch 155 k Installing for dependencies: hwloc x86_64 1.5-3.el6_5 rhel-x86_64-server-6 1.4 M jemalloc x86_64 3.6.0-1.el6 epel 100 k Transaction Summary ======================================================================================================================== Install 4 Package(s)
The first thing to do is to create necessary directories and mount NFS share. If this is the first execution host in the cluster you need to decide how much of $SGE_ROOT tree you want to share.
For small installations the master host can also serve as NFS server. But of cause you can get better result using specialized server. Select how much you need to share. Most simple SGE installation for small clusters share either
For RPM-based installation the second method is preferable as you already installed binaries on the host: it make no sense to put executable on NSF so minimum is $SGE_ROOT/$SGE_CELL/common directory. Of course, in this case, you need install executables on each execution host. But as installing prerequisite RPMs is enough trouble (and you need to do it in any case), so two more RPMs does not make much difference.
Installation of the RPMs should be carried out using YUM as any additional software dependencies will be automatically resolved from already installed RPMs. You need the following RPMs to be put in some directory:
# ll total 17644 -rw-r--r-- 1 root root 14941856 Nov 5 10:30 gridengine-8.1.8-1.el6.x86_64.rpm -rw-r--r-- 1 root root 1440192 Nov 5 10:30 gridengine-execd-8.1.8-1.el6.x86_64.rpm -rw-r--r-- 1 root root 1490312 Sep 22 06:33 hwloc-1.5-3.el6_5.x86_64.rpm -rw-r--r-- 1 root root 102624 Apr 1 2014 jemalloc-3.6.0-1.el6.x86_64.rpm -rw-r--r-- 1 root root 74068 Nov 6 12:39 perl-XML-Simple-2.18-6.el6.noarch.rpm
If you put all the necessary RPMs into a single directory, then you can install execution host using command
yum install *.rpm
You will see the following messages:
[0]root@lustwzb2: # yum install *.rpm Loaded plugins: product-id, refresh-packagekit, rhnplugin, security, subscription-manager This system is not registered to Red Hat Subscription Management. You can use subscription-manager to register. This system is receiving updates from RHN Classic or RHN Satellite. rhel-x86_64-server-6 | 1.8 kB 00:00 Setting up Install Process Examining gridengine-8.1.8-1.el6.x86_64.rpm: gridengine-8.1.8-1.el6.x86_64 Marking gridengine-8.1.8-1.el6.x86_64.rpm to be installed Examining gridengine-execd-8.1.8-1.el6.x86_64.rpm: gridengine-execd-8.1.8-1.el6.x86_64 Marking gridengine-execd-8.1.8-1.el6.x86_64.rpm to be installed Examining hwloc-1.5-3.el6_5.x86_64.rpm: hwloc-1.5-3.el6_5.x86_64 Marking hwloc-1.5-3.el6_5.x86_64.rpm to be installed Examining jemalloc-3.6.0-1.el6.x86_64.rpm: jemalloc-3.6.0-1.el6.x86_64 Marking jemalloc-3.6.0-1.el6.x86_64.rpm to be installed Examining perl-XML-Simple-2.18-6.el6.noarch.rpm: perl-XML-Simple-2.18-6.el6.noarch Marking perl-XML-Simple-2.18-6.el6.noarch.rpm to be installed Resolving Dependencies --> Running transaction check ---> Package gridengine.x86_64 0:8.1.8-1.el6 will be installed ---> Package gridengine-execd.x86_64 0:8.1.8-1.el6 will be installed --> Processing Dependency: xterm for package: gridengine-execd-8.1.8-1.el6.x86_64 ---> Package hwloc.x86_64 0:1.5-3.el6_5 will be installed ---> Package jemalloc.x86_64 0:3.6.0-1.el6 will be installed ---> Package perl-XML-Simple.noarch 0:2.18-6.el6 will be installed --> Running transaction check ---> Package xterm.x86_64 0:253-1.el6 will be installed --> Finished Dependency Resolution Dependencies Resolved ======================================================================================================================== Package Arch Version Repository Size ======================================================================================================================== Installing: gridengine x86_64 8.1.8-1.el6 /gridengine-8.1.8-1.el6.x86_64 40 M gridengine-execd x86_64 8.1.8-1.el6 /gridengine-execd-8.1.8-1.el6.x86_64 3.8 M hwloc x86_64 1.5-3.el6_5 /hwloc-1.5-3.el6_5.x86_64 1.9 M jemalloc x86_64 3.6.0-1.el6 /jemalloc-3.6.0-1.el6.x86_64 315 k perl-XML-Simple noarch 2.18-6.el6 /perl-XML-Simple-2.18-6.el6.noarch 155 k Installing for dependencies: xterm x86_64 253-1.el6 rhel-x86_64-server-6 357 k Transaction Summary ======================================================================================================================== Install 6 Package(s) Total size: 47 M Total download size: 357 k Installed size: 46 M Is this ok [y/N]: y Downloading Packages: xterm-253-1.el6.x86_64.rpm | 357 kB 00:00 Running rpm_check_debug Running Transaction Test Transaction Test Succeeded Running Transaction Installing : hwloc-1.5-3.el6_5.x86_64 1/6 Installing : jemalloc-3.6.0-1.el6.x86_64 2/6 Installing : perl-XML-Simple-2.18-6.el6.noarch 3/6 Installing : gridengine-8.1.8-1.el6.x86_64 4/6 Installing : xterm-253-1.el6.x86_64 5/6 Installing : gridengine-execd-8.1.8-1.el6.x86_64 6/6 Verifying : gridengine-8.1.8-1.el6.x86_64 1/6 Verifying : gridengine-execd-8.1.8-1.el6.x86_64 2/6 Verifying : jemalloc-3.6.0-1.el6.x86_64 3/6 Verifying : xterm-253-1.el6.x86_64 4/6 Verifying : perl-XML-Simple-2.18-6.el6.noarch 5/6 Verifying : hwloc-1.5-3.el6_5.x86_64 6/6 Installed: gridengine.x86_64 0:8.1.8-1.el6 gridengine-execd.x86_64 0:8.1.8-1.el6 hwloc.x86_64 0:1.5-3.el6_5 jemalloc.x86_64 0:3.6.0-1.el6 perl-XML-Simple.noarch 0:2.18-6.el6 Dependency Installed: xterm.x86_64 0:253-1.el6 Complete!
Now you need to correct a bug with sgeadmin account creation:
RPM creates the account using generic useradd command and does not synchronizes the value of UID and GID with the master host. This is a nasty bug: if number of accounts and groups on execution host is differnt from the master host (and typically it is ), UID and GID will be different. As a result you execution daemon will not be able to communicate with the master host. Installer does not detect this error. See how b2 looks in the qhost command below:
z99: # qhost HOSTNAME ARCH NCPU NSOC NCOR NTHR LOAD MEMTOT MEMUSE SWAPTO SWAPUS ---------------------------------------------------------------------------------------- global - - - - - - - - - - z99 lx-amd64 40 2 20 40 - 62.9G - 15.6G - b1 lx-amd64 20 2 20 20 0.00 62.9G 945.9M 0.0 0.0 b2 - - - - - - - - - -For example you can have on the execution host
sgeadmin:x:496:492:Grid Engine admin:/:/sbin/nologinAnd on the master host
sgeadmin:x:495:490:Grid Engine admin:/:/sbin/nologinYou now need to execute the commands
groupmod -g 490 sgeadmin usermod -u 495 sgeadmin
chown sgeadmin:sgeadmin /opt/sge/defaultto correct this situation.
In other words, the creation of sgeadmin account by RPM is no so good idea. It is better to do it int he installer, as installer is running as root and can read any files on the shared part of the master tree, if any.
Generally for small installations (say, less then 32 nodes and 640 cores) that do not have huge load of small (several minutes length) jobs, there is not much difference whether you share $SGE_ROOT or $SGE_ROOT/default. In this particular case sharing less does not really improve efficiency on modern servers. See Usage of NFS in Grid Engine.
For example
export SGE_ROOT=/opt/sge # installation directory should be identical with master host export SGE_CELL=default # should be the same as on master host SGE_MASTER=qmaster # hostname of your SGE master host. Should be in /etc/hosts SGE_NFS_SHARE=$SGE_ROOT/$SGE_CELL # specify how much you share
echo "$SGE_MASTER:$SGE_NFS_SHARE $SGE_NFS_SHARE nfs vers=3,rw,hard,intr,tcp,rsize=32768,wsize=32768 1 2" >> /etc/fstab mount $SGE_NFS_SHARE
Copy spool files from /opt/sge/utilbin/lx-amd64/ on the master host to the execution host
scp spool* b2:/opt/sge/utilbin/lx-amd64/spool[di]* spooldefaults 100% 297KB 297.3KB/s 00:00 spoolinit 100% 1414KB 1.4MB/s 00:00
./install_execd -nobincheck
During installation the execution host on which you are now installing sgeexec daemon should made administrative host using the command the qconf -ah <hostname> Ensure this is the case (you can remove it as an admin host after the install if you wish), then press enter to continue
If you share less the $SGE_ROOT you need to use option -nobincheck with the installer:
./install_execd -nobincheck
See step by step instructions in Installation of the Grid Engine Execution Host
Grid Engine messages can be found in syslog during startup:
After startup the daemons log their messages in their spool directories.
$SGE_ROOT/$SGE_DELL/spool/qmaster/messages
$SGE_ROOT/$SGE_DELL/spool/$EXEC_HOSTNAME/messages
Where $EXEC_HOSTNAME is hostname of execution host we want to see the messages from.
|
Switchboard | ||||
Latest | |||||
Past week | |||||
Past month |
Mar 02, 2016 | liv.ac.uk
README
This is Son of Grid Engine version v8.1.9.
See <http://arc.liv.ac.uk/repos/darcs/sge-release/NEWS> for information on recent changes. See <https://arc.liv.ac.uk/trac/SGE> for more information.
The .deb and .rpm packages and the source tarball are signed with PGP key B5AEEEA9.
* sge-8.1.9.tar.gz, sge-8.1.9.tar.gz.sig: Source tarball and PGP signature * RPMs for Red Hat-ish systems, installing into /opt/sge with GUI installer and Hadoop support: * gridengine-8.1.9-1.el5.src.rpm: Source RPM for RHEL, Fedora * gridengine-*8.1.9-1.el6.x86_64.rpm: RPMs for RHEL 6 (and CentOS, SL) See < https://copr.fedorainfracloud.org/coprs/loveshack/SGE/ > for hwloc 1.6 RPMs if you need them for building/installing RHEL5 RPMs. * Debian packages, installing into /opt/sge, not providing the GUI installer or Hadoop support: * sge_8.1.9.dsc, sge_8.1.9.tar.gz: Source packaging. See <http://wiki.debian.org/BuildingAPackage> , and see < http://arc.liv.ac.uk/downloads/SGE/support/ > if you need (a more recent) hwloc. * sge-common_8.1.9_all.deb, sge-doc_8.1.9_all.deb, sge_8.1.9_amd64.deb, sge-dbg_8.1.9_amd64.deb: Binary packages built on Debian Jessie. * debian-8.1.9.tar.gz: Alternative Debian packaging, for installing into /usr. * arco-8.1.6.tar.gz: ARCo source (unchanged from previous version) * dbwriter-8.1.6.tar.gz: compiled dbwriter component of ARCo (unchanged from previous version)More RPMs (unsigned, unfortunately) are available at < http://copr.fedoraproject.org/coprs/loveshack/SGE/ >.
fsl.fmrib.ox.ac.uk
This is a quick walk through to get Grid Engine going on Linux for those who would like to use it for something like FSL. This documentation is a little old, being written when the Grid Engine software was owned by Sun and often referred to as SGE (Sun Grid Engine). However, this covers the basic requirements. A quick start guide for Ubuntu/Debian is available here, but more detailed setup can be found on this page.
Since the demise of the open source (Sun) Grid Engine, various ports have sprung up. Ubuntu/Debian package the last publicly available release (6.2u5), but users of Red Hat variants (CentOS, Scientific Linux) or Debian/Ubuntu users wishing to use a more modern release should look to installing Son of Grid Engine which makes available RPM and DEB packages and is still actively maintained (last update November 2013).
Grid Engine generally consists of one master (qmaster) and a number of execute (exec) hosts, note that the qmaster machine can also be an exec host which is fine for small deployments, but large clusters should look to keeping these functions separate.
This documentation was originally produced by A. Janke ([email protected]) and is now maintained by the FSL team.
NFS
Although Grid Engine can be configured such that all machines are self contained, the instructions here assume that at least some of the Grid Engine folders are shared amongst the controller (qmaster) and clients (exec hosts). To achieve this you will typically need to setup one or more NFS shares, typically at least the configuration files (see http://arc.liv.ac.uk/SGE/howto/nfsreduce.html). Further, the FSL binaries and datasets to be operated on should be made available to all exec hosts in the same filesystem location. In the case of the FSL software, you could install this to the same location on all execution hosts or install to one location and NFS mount this to the same location on all hosts. In the case of datasets, the instructions here assume you are using NFS mounts, but through prolog and epilog scripts it is possible to setup Grid Engine to copy data to/from exec hosts.
Setting up NFS shares is beyond the scope of this document.
Name services
Grid Engine needs to be able to locate exec hosts/qmasters based on host name. Assuming all of your hosts are known to your DNS service then you will have to do no work to set this up. If you don't have a DNS zone then you may need to configure the local /etc/hosts file to resolve hostnames or look into host aliases (man host_aliases) configuration
User accounts
Grid Engine runs the scheduled job as the user who submitted it, using the textual name form (not numeric ID). Consequently, all exec hosts need to know about all users who are going to submit jobs. In a very small scale setup you may wish to add the required users directly to each exec host, but this quickly becomes unmanageable, so we would recommend setting up some kind of centralised user database, e.g. LDAP, Active Directory.
Setting this up shared user accounts is beyond the scope of this document.
Admin account
The Grid Engine software has to run as a privileged user in order to be able to run jobs as the submitting user. However, as this is a potential security issue, the grid software that communicates with the network can be run under an admin account that doesn't have root access. This account needs to be available on all cluster hosts, so either set this up locally, or add it to your central LDAP/user account system.
If you decide to have a locally defined daemon account then set this up as follows (run as the root user) (this is Red Hat dialect, for Ubuntu/Debian use the interactive adduser command).
useradd --home /opt/sge --system sgeadminwhich will add a system account (e.g. no home folder creation, no ageing of the account etc). This should be run on the qmaster and all exec hosts.
Service ports
Grid Engine communicates over two statically configured ports. These ports have to be the same on all computers, and can be configured in the file /etc/services or by changing the Grid Engine configuration setup files that all users need to source to be able to use the software. The latter option is best where you need to have more than one cluster in a location, as each qmaster/exec host has to communicate with the different clusters on different ports. Modern Linux distributions are already setup with entries for Grid Engine (use grep sge_qmaster /etc/services to confirm). If your distribution does not include entries, then you need to add the following to this file:
sge_qmaster 6444/tcp # Grid Engine Qmaster Service sge_qmaster 6444/udp # Grid Engine Qmaster Service sge_execd 6445/tcp # Grid Engine Execution Service sge_execd 6445/udp # Grid Engine Execution Servicecommenting out any prior definitions for the ports 6444 and 6445.
... ... ...
Installation
Where we refer to $SGE_ROOT, when using the Son Of Grid Engine packages, this will be /opt/sge.
QMaster
Red Hat Enterprise etc
Installation of the RPMs should be carried out using YUM as any additional software dependancies will be automatically resolved. A Grid master can be installed using:
yum install gridengine-8.1.6-1.el6.x86_64.rpm gridengine-qmaster-8.1.6-1.el6.x86_64.rpm gridengine-execd-8.1.6-1.el6.x86_64.rpm gridengine-qmon-8.1.6-1.el6.x86_64.rpm gridengine-guiinst-8.1.6-1.el6.noarch.rpmSet an environment variable and then install the qmaster as such:
export SGE_ROOT=/opt/sge cd $SGE_ROOT ./install_qmasterNow go through the interactive install process:
- press enter at the intro screen
- press "y" and then specify sgeadmin as the user id
- leave the install dir as /opt/sge
- You will now be asked about port configuration for the master, normally you would choose the default (2) which uses the /etc/services file
- accept the sge_qmaster info
- You will now be asked about port configuration for the master, normally you would choose the default (2) which uses the /etc/services file
- accept the sge_execd info
- leave the cell name as "default"
- Enter an appropriate cluster name when requested
- leave the spool dir as is
- press "n" for no windows hosts!
- press "y" (permissions are set correctly)
- press "y" for all hosts in one domain
- If you have Java available on your Qmaster and wish to use SGE Inspect or SDM then enable the JMX MBean server and provide the requested information - probably answer "n" at this point!
- press enter to accept the directory creation notification
- enter "classic" for classic spooling (berkeleydb may be more appropriate for large clusters)
- press enter to accept the next notice
- enter "20000-20100" as the GID range (increase this range if you have execution nodes capable of running more than 100 concurrent jobs)
- accept the default spool dir or specify a different folder (for example if you wish to use a shared or local folder outside of SGE_ROOT
- enter an email address that will be sent problem reports
- press "n" to refuse to change the parameters you have just configured
- press enter to accept the next notice
- press "y" to install the startup scripts
- press enter twice to confirm the following messages
- press "n" for a file with a list of hosts
- enter the names of your hosts who will be able to administer and submit jobs (enter alone to finish adding hosts)
- skip shadow hosts for now (press "n")
- choose "1" for normal configuration and agree with "y"
- press enter to accept the next message and "n" to refuse to see the previous screen again and then finally enter to exit the installer
Now that we are back to a shell (finally) we need to add a few things to our root .bashrc so that we can access the SGE binaries. Add the following lines to /root/.bashrc
# SGE settings export SGE_ROOT=/usr/sge export SGE_CELL=default if [ -e $SGE_ROOT/$SGE_CELL ] then . $SGE_ROOT/$SGE_CELL/common/settings.sh fiAnd then be sure to re-source your .bashrc
. /root/.bashrcNow we can add our own username as an admin so that we can manage the system without becoming root.
qconf -am <myusername>e.g qconf -am jbloggs if your username is jbloggs.
Exec Host
The process for installing exec hosts is as follows
- Add the exec host to the master host as an admin host. If your exec host is called client.foo.com then run this on your master host:
qconf -ah client.foo.com- On the client (client.foo.com)
- Add the sgeadmin username as per above
- Add the lines to /etc/services if required
- Add the SGE bits to /root/.bashrc and re-source it (. /.bashrc)
- Ensure the binaries have been installed
- Set an environment variable and then install the exec host (this might be the same machine as the queue master, for example if you only have one computer)
export SGE_ROOT=/opt/sge cd $SGE_ROOT ./install_execd- Now go through the interactive install process:
- The installer will ask that you check that this host has been added as an administrative host with the qconf -ah <hostname> command. Ensure this is the case (you can remove it as an admin host after the install if you wish), then press enter to continue
- Make sure the Grid Engine root matches that configured on the Qmaster (/opt/sge)
- Ensure the cell name matches that configured on the master (default is usually fine "default")
- Accept the age_execd port setting
- Accept the message about the host being known as an admin host
- Make a decision about the spool directory. For medium to large clusters local spool directories are the best option, for small (this should be an NFS mount) or stand-alone installs the default is fine. An appropriate local spool folder name might be /var/spool/sge. If you choose to have a local spool folder you will now receive a warning that the change of 'execd_spool_dir' not being effective before execd has been restarted - you will have to stop/start the execd after completing the install for this to take effect.
- press "y" to install the startup scripts
- confirm you have read the following messages
- When asked about adding a default queue instance for this host answer "n" - FSL requires specific queues, so it is better to define these rather than the default queue.
- press enter to accept the next message and "n" to refuse to see the previous screen again and then finally enter to exit the installer
Repeat this installation procedure on all of the execution hosts...
To embark on the installation of the gridengine packages, run the following command on your terminal:
1 2
sudo
apt-get
install
\
gridengine-master gridengine-
exec
gridengine-common gridengine-qmon gridengine-client
Instead, you can run the shorter, and perhaps more error-prone, command
1 sudo
apt-get
install
gridengine-*
A pop-up window will appear within the terminal during installation, with title "Configuring gridengine-common". A series of questions show up sequentially in this window:
- Question: "Configure SGE automatically?" Answer: highlight "<Yes>" and press "Enter".
- Question: "SGE cell name:" Answer: type "default", then press "Tab" to highlight "<Ok>" and press "Enter".
Note here that you are free to choose any name you want for your SGE cell instead of "default", such as "sge_cell" for example. If you alter the SGE cell name, you will have to subsequently set the SGE_CELL variable in your ~/.bashrc file accordingly (assuming that bash is your default shell). For instance, if you set the SGE cell name to be sge_cell, you will add the following line in your ~/.bashrc:
1 export
SGE_CELL=
"sge_cell"
Furthermore, you will need to add the above line of code in your /root/.bashrc file so that the SGE cell is also known to the root. It is advised that you leave the SGE cell name as it is, holding the "default" value.
- Question: "SGE master hostname:" Answer: type "localhost", then press "Tab" to highlight "<Ok>" and press "Enter".
Instead of "localhost", you can choose the hostname of your computer, which can by found by running the "hostname" command from the terminal:
1 hostname
After answering these three questions, the pop-up window closes and the installation continues on the terminal. If for any reason you need to reconfigure the gridengine-master package, you can do so by invoking the following command:
1 sudo
dpkg-reconfigure gridengine-master
The installation of gridengine is now complete, yet this does not mean that you are necessarily ready to use SGE. First of all, check whether sge_qmaster and sge_execd are running by using the command
1 ps
aux |
grep
"sge"
The output I got verified that sge_qmaster and sge_execd are running:
1 2
3sgeadmin 1310 0.0 0.1 135968 5376 ? Sl 13:41 0:00 /usr/lib/gridengine/sge_qmaster
sgeadmin 1336 0.0 0.0 54760 1544 ? Sl 13:41 0:00 /usr/lib/gridengine/sge_execd
1000 3171 0.0 0.0 7780 860 pts/0 S+ 13:54 0:00 grep --colour=auto sge
If this is not the case for you, then start up sge_qmaster and sge_execd by executing the following three commands:
1 2
3sudo
su
sge_qmaster
sge_execd
Once you ensure that sge_qmaster and sge_execd are running, try to start qmon, the graphical user interface (GUI) for the administration of SGE:
1 sudo
qmon
It is likely that the qmon window will not load, but instead you will get an error message. This is what I got:
1 2
34
56
7Warning: Cannot convert string "-adobe-courier-medium-r-*--14-*-*-*-m-*-*-*" to type FontStruct
Warning: Cannot convert string "-adobe-courier-bold-r-*--14-*-*-*-m-*-*-*" to type FontStruct
Warning: Cannot convert string "-adobe-courier-medium-r-*--12-*-*-*-m-*-*-*" to type FontStruct
X Error of failed request: BadName (named color or font does not exist)
Major opcode of failed request: 45 (X_OpenFont)
Serial number of failed request: 643
Current serial number in output stream: 654
The error message indicates that some fonts are missing. The package which contains the necessary fonts is called xfonts-75dpi. In my case, xfonts-75dpi was installed automatically alongside the installation of the gridengine packages. Nevertheless, I got the error message because the fonts were not loaded after their installation. So, I merely restarted my computer. After rebooting, the "sudo qmon" command loaded the qmon window. If xfonts-75dpi is not installed on your system, then install it using the following command and then reboot:
1 sudo
apt-get
install
xfonts-75dpi
After having resolved any possible font-related issues "sudo qmon" should load the SGE admin window. If you let the window remain idle or if you try to press any of its buttons, such as "Job Control", the most likely event will be the appearance of a message pop-up window with the text "cannot reach qmaster". Click on the "Abort" button of the pop-up window to terminate qmon. Try also the qstat command, which in my case gave the following error message:
1 2
error: commlib error: access denied (client IP resolved to host name "localhost". This is not identical to clients host name "russell")
error: unable to contact qmaster using port 6444 on host "russell"
It is useful to delve in the error message in conjunction with the /etc/hosts file of my system:
1 2
34
56
78
9127.0.0.1 localhost
127.0.1.1 russell
# The following lines are desirable for IPv6 capable hosts
::1 localhost ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
ff02::3 ip6-allhosts
The hostname of my computer is "russell". According to the error message, SGE set the client hostname to "russell", whose LAN IP address is 127.0.1.1, while it set the client IP to 127.0.0.1, which is the LAN IP designated to the hostname "localhost". To resolve this ambiguity, I changed the first two lines of my /etc/hosts so that both hostnames "localhost" and "russell" share the same LAN IP (as a word of warning, make a backup of your /etc/hosts file before making any changes to it). To be more specific, I deleted the second line and appended the "russell" hostname to the end of the first line. My /etc/hosts file thus became:
1 2
34
56
78
127.0.0.1 localhost russell
# The following lines are desirable for IPv6 capable hosts
::1 localhost ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
ff02::3 ip6-allhosts
Moreover, it is possible that your /etc/hosts file contains by default the string "localhost.localdomain" in the first line, for example as in
1 127.0.0.1 localhost localhost.localdomain russell
If that's the case, make sure you remove "localhost.localdomain" so that only "localhost" and your machine's hostname ("russell" is my hostname), are tied to the LAN IP 127.0.0.1:
1 127.0.0.1 localhost russell
You may restart sge_qmaster and sge_execd, although it is not advised given that you made a fundamental change to your system's state by reconfiguring the association between IPs and hostnames in the /etc/hosts file. Instead, you are advised to restart your computer before you proceed any further. After rebooting, "qstat" and "sudo qmon" should run without returning any error messages.
sites.google.com/site/woojay
V. Installing the Sun Grid EngineI basically followed the installation instructions on the Grid Engine website to install qmaster via "./inst_sge -m". I used Padraig's specifications, which I am going to quote here:
I will just add that if the installer asks if you want to enable a JMX MBean server, you can answer no.
- Install as root user: You don't have to do this, but it simplifies the process of getting SGE to run on the DRBL nodes. Please note that in recommending this, I am presuming that your cluster network is private, and the nodes won't be accessible by non-privileged users (i.e. not you).
- Do opt to verify file permissions
- Select to use BerkleyDB, but without a spool server
- Use the ID range suggested in the manual: 20000-20100
- Accept to install startup scripts
- Accept to load a file which contains the hostnames of your nodes. Here you enter the full path to file you created before running the install script.
- Use normal scheduling
After installation, I ran:
source /opt/oge/default/common/settings.shto configure various environment variables. I also added this command to my .bashrc file.
Google matched content |
Installation of the Grid Engine Execution Host
Installation instructions for Sun of Grid engine 8.1.6. A useful installation instructions for Sun of Grid engine 8.1.6. fsl.fmrib.ox.ac.uk
Society
Groupthink : Two Party System as Polyarchy : Corruption of Regulators : Bureaucracies : Understanding Micromanagers and Control Freaks : Toxic Managers : Harvard Mafia : Diplomatic Communication : Surviving a Bad Performance Review : Insufficient Retirement Funds as Immanent Problem of Neoliberal Regime : PseudoScience : Who Rules America : Neoliberalism : The Iron Law of Oligarchy : Libertarian Philosophy
Quotes
War and Peace : Skeptical Finance : John Kenneth Galbraith :Talleyrand : Oscar Wilde : Otto Von Bismarck : Keynes : George Carlin : Skeptics : Propaganda : SE quotes : Language Design and Programming Quotes : Random IT-related quotes : Somerset Maugham : Marcus Aurelius : Kurt Vonnegut : Eric Hoffer : Winston Churchill : Napoleon Bonaparte : Ambrose Bierce : Bernard Shaw : Mark Twain Quotes
Bulletin:
Vol 25, No.12 (December, 2013) Rational Fools vs. Efficient Crooks The efficient markets hypothesis : Political Skeptic Bulletin, 2013 : Unemployment Bulletin, 2010 : Vol 23, No.10 (October, 2011) An observation about corporate security departments : Slightly Skeptical Euromaydan Chronicles, June 2014 : Greenspan legacy bulletin, 2008 : Vol 25, No.10 (October, 2013) Cryptolocker Trojan (Win32/Crilock.A) : Vol 25, No.08 (August, 2013) Cloud providers as intelligence collection hubs : Financial Humor Bulletin, 2010 : Inequality Bulletin, 2009 : Financial Humor Bulletin, 2008 : Copyleft Problems Bulletin, 2004 : Financial Humor Bulletin, 2011 : Energy Bulletin, 2010 : Malware Protection Bulletin, 2010 : Vol 26, No.1 (January, 2013) Object-Oriented Cult : Political Skeptic Bulletin, 2011 : Vol 23, No.11 (November, 2011) Softpanorama classification of sysadmin horror stories : Vol 25, No.05 (May, 2013) Corporate bullshit as a communication method : Vol 25, No.06 (June, 2013) A Note on the Relationship of Brooks Law and Conway Law
History:
Fifty glorious years (1950-2000): the triumph of the US computer engineering : Donald Knuth : TAoCP and its Influence of Computer Science : Richard Stallman : Linus Torvalds : Larry Wall : John K. Ousterhout : CTSS : Multix OS Unix History : Unix shell history : VI editor : History of pipes concept : Solaris : MS DOS : Programming Languages History : PL/1 : Simula 67 : C : History of GCC development : Scripting Languages : Perl history : OS History : Mail : DNS : SSH : CPU Instruction Sets : SPARC systems 1987-2006 : Norton Commander : Norton Utilities : Norton Ghost : Frontpage history : Malware Defense History : GNU Screen : OSS early history
Classic books:
The Peter Principle : Parkinson Law : 1984 : The Mythical Man-Month : How to Solve It by George Polya : The Art of Computer Programming : The Elements of Programming Style : The Unix Hater’s Handbook : The Jargon file : The True Believer : Programming Pearls : The Good Soldier Svejk : The Power Elite
Most popular humor pages:
Manifest of the Softpanorama IT Slacker Society : Ten Commandments of the IT Slackers Society : Computer Humor Collection : BSD Logo Story : The Cuckoo's Egg : IT Slang : C++ Humor : ARE YOU A BBS ADDICT? : The Perl Purity Test : Object oriented programmers of all nations : Financial Humor : Financial Humor Bulletin, 2008 : Financial Humor Bulletin, 2010 : The Most Comprehensive Collection of Editor-related Humor : Programming Language Humor : Goldman Sachs related humor : Greenspan humor : C Humor : Scripting Humor : Real Programmers Humor : Web Humor : GPL-related Humor : OFM Humor : Politically Incorrect Humor : IDS Humor : "Linux Sucks" Humor : Russian Musical Humor : Best Russian Programmer Humor : Microsoft plans to buy Catholic Church : Richard Stallman Related Humor : Admin Humor : Perl-related Humor : Linus Torvalds Related humor : PseudoScience Related Humor : Networking Humor : Shell Humor : Financial Humor Bulletin, 2011 : Financial Humor Bulletin, 2012 : Financial Humor Bulletin, 2013 : Java Humor : Software Engineering Humor : Sun Solaris Related Humor : Education Humor : IBM Humor : Assembler-related Humor : VIM Humor : Computer Viruses Humor : Bright tomorrow is rescheduled to a day after tomorrow : Classic Computer Humor
The Last but not Least Technology is dominated by two types of people: those who understand what they do not manage and those who manage what they do not understand ~Archibald Putt. Ph.D
Copyright © 1996-2021 by Softpanorama Society. www.softpanorama.org was initially created as a service to the (now defunct) UN Sustainable Development Networking Programme (SDNP) without any remuneration. This document is an industrial compilation designed and created exclusively for educational use and is distributed under the Softpanorama Content License. Original materials copyright belong to respective owners. Quotes are made for educational purposes only in compliance with the fair use doctrine.
FAIR USE NOTICE This site contains copyrighted material the use of which has not always been specifically authorized by the copyright owner. We are making such material available to advance understanding of computer science, IT technology, economic, scientific, and social issues. We believe this constitutes a 'fair use' of any such copyrighted material as provided by section 107 of the US Copyright Law according to which such material can be distributed without profit exclusively for research and educational purposes.
This is a Spartan WHYFF (We Help You For Free) site written by people for whom English is not a native language. Grammar and spelling errors should be expected. The site contain some broken links as it develops like a living tree...
|
You can use PayPal to to buy a cup of coffee for authors of this site |
Disclaimer:
The statements, views and opinions presented on this web page are those of the author (or referenced source) and are not endorsed by, nor do they necessarily reflect, the opinions of the Softpanorama society. We do not warrant the correctness of the information provided or its fitness for any purpose. The site uses AdSense so you need to be aware of Google privacy policy. You you do not want to be tracked by Google please disable Javascript for this site. This site is perfectly usable without Javascript.
Last modified: December 26, 2017