|
Home | Switchboard | Unix Administration | Red Hat | TCP/IP Networks | Neoliberalism | Toxic Managers |
(slightly skeptical) Educational society promoting "Back to basics" movement against IT overcomplexity and bastardization of classic Unix |
|
|
We will assume that installation is performed on RHEL 6.5 or 6.6. We also assume that NFS is used for sharing files with master host.
Degree of sharing is not that important but generally $SGE_ROOT/$SGE_CELL should be shared. Efficiency consideration that are sited by many are overblown and without careful measurements and determining real bottleneck you might fall into classical trap called "premature optimization". And as Donald Knuth used to say "Premature optimization is the source of all evil". and long before him Talleyrand gave the following advice to young diplomats: "First and foremost, not too too much zeal". Just substitute "young diplomats" for novice SGE administrators.
The same issue applies to a choice between classic spooling vs. Berkeley DB. Without measurements the selection of Berkeley DB is fools gold.
Before installing an execution host, you first need to install and configure the master. Master should be running for installation of an execution host.
The installation process is pretty simple:
That's why you need to register daemon with chkconfig and start it manually. If you share the whole $SGE_ROOT tree of $SGE_ROOT/$SGE_CELL subtree via NFS, there is no need to to run thee execution daemon setup script, although this is the less consuming part of the installation. Here is how Dave love explained the situation:
== Set Up ==
After installing the distribution into place you need to run the installation script `inst_sge` (or a wrapper) on your master host, and on all execution hosts which don't share the installation via a distributed file system. This will configure and start up the daemons.
If the installation is shared with the execution hosts, using a shared, writable spool directory, it isn't necessary to do an explicit install on them. Use `inst_sge` to install the execd on the master host and copy to or share with the hosts the `sgeexecd` init script.
Then start this after the qmaster has been configured with the relevant host names (with `qconf -ah`).
You may or may not want to keep the execd on the master host for maintaining global load values, for instance, but you probably want to ensure it has zero slots, so as not to be able to run jobs.
You can install as many execution hosts as you want using GUI installer: just add as many hosts as you wish and they will be installed one by one in one batch.
On execution host you need to check the following four preconditions:
Register and patch the server. You might need access to repository for the installation. I am not sure that necessary RPMs are present of RHEL installation DVD.
Update or install java Usually java is already installed. But you still need to verify that. In case it is not, you need to install it.
Configure NTP. Check using the command: ntpdate -u ntp1.firm.com
In addition you need to perform several steps to ensure proper environment for the execd.
For installation of execution host you need a running Grid Engine qmaster.
qconf -sh
You can add an administrative host with the command:
qconf -ah <hostname>
If you RPM-based installation then mounting NSF directory is a pre-requisite for installing RPMs and as such it is already met. In case of tar files based installation you need to do it now, if this is the first execution host in the cluster.
First you need to decide how much of $SGE_ROOT tree you want to share. For small installations the master host can also serve as NFS server. But of cause you can get better result using specialized server. Select how much you need to share. Most simple SGE installation for small clusters share either
For RPM-based installation the second method is preferable as you already installed binaries on the host: it make no sense to put executable on NSF so minimum is $SGE_ROOT/$SGE_CELL/common directory. Of course, in this case, you need install executables on each execution host. But as installing prerequisite RPMs is enough trouble (and you need to do it in any case), so two more RPMs does not make much difference.
Generally for small installations (say, less then 32 nodes and 640 cores) that do not have huge load of small (several minutes length) jobs, there is not much difference whether you share $SGE_ROOT or $SGE_ROOT/default. In this particular case sharing less does not really improve efficiency on modern servers. See Usage of NFS in Grid Engine.
Create the directory for shared files (for example /opt/sge/default) and put an appropriate line in /etc/fstab file.
export $SGE_ROOT=/opt/SGE # sge installation root directory. export $SGE_DELL=default # cell directory $SGE_MASTER=qmaster # hostname of your SGE master host. Should be in /etc/hosts
SGE_NFS_SHARE=$SGE_ROOT/$SGE_CELL # specify how much you share mkdir -p $SGE_NFS_SHARE echo "$SGE_MASTER:$SGE_NFS_SHARE $SGE_NFS_SHARE nfs vers=3,rw,hard,intr,tcp,rsize=32768,wsize=32768 1 2"" >> /etc/fstab mount $SGE_NFS_SHARE
Now you need to add the host to the /etc/exports file on the master host or NSF server if you share by host, not my netmask. For example:
/opt/sge 10.194.186.254(rw,no_root_squash) 10.194.181.26(rw,no_root_squash)
After that restart the NFS daemon so that NFS daemon reread export file:
# service nfs restart Shutting down NFS mountd: [ OK ] Shutting down NFS daemon: [ OK ] Shutting down NFS quotas: [ OK ] Shutting down NFS services: [ OK ] Starting NFS services: [ OK ] Starting NFS quotas: [ OK ] Starting NFS daemon: [ OK ] Starting NFS mountd: [ OK ]
If this was not done before, universally for all execution hosts, you need to create passwordless login environment now. This might be the case when this is the first execution host you are installing. In all other cases ssh certificates were already generated, you need just copy /root/.ssh/authorized_hosts for any working execution host.
Tip: If you already have any execution host already configured , just copy file authorized_hosts from already configured execution host.
cd /root/.ssh scp sge01:/root/.ssh/authorized_hosts .Check ssh access from the master host to the node on which you install the execution host (b5 in the example below):
[0]root@m17: # ssh b5 The authenticity of host 'b5 (10.194.181.46)' can't be established. RSA key fingerprint is 18:35:6e:96:11:77:27:fc:ac:1c:8e:46:36:2b:ae:2b. Are you sure you want to continue connecting (yes/no)? yes Warning: Permanently added 'b5,10.194.181.46' (RSA) to the list of known hosts. Last login: Thu Jul 26 08:29:41 2012 from sge_master.firma.net
Check if sgeadmin user is present if you are using it and it has GID and UID identical to the sgeadmin account on the master host. Execution host installer does not check that and at the end your execution host will not be able to communicate with the master and you might do not know what to do, as you missed this step and forget about this tricky error. Typically ROM-based installations install this user as a part of RPM install, but tar-based installation do not and. if you want to use it, you need to create it manually.
Check if sgeadmin user is present if you are using it and it has GID ans UID identical to the sgeadmin account on the master host. Execution host installer does not check that and at the end your execution host will not be able to communicate with the master and you might do not know what to do, as you missed this step and forget about this tricky error. |
You also need to verify that UID and GUI of all users and application accounts are identical to the master host. There can be tricky errors, especially if sgeadmin user UID and GID are different or the user id for important application does not match.
If this is a reinstallation make sure that old version of execd daemon is not running and new version of qmaster daemon is running. Also remove star files from /etc/init.d.
Check /etc/services. If you installed SGE from RPMs, such as Son of Grid engine RPMs, this typically is added by RPMs. If necessary, manually add ports that you used during configuration of SGE master. For example:
sge_qmaster 6444/tcp # Grid Engine Qmaster Service sge_qmaster 6444/udp # Grid Engine Qmaster Service sge_execd 6445/tcp # Grid Engine Execution Service sge_execd 6445/udp # Grid Engine Execution Service
On execution host and master host: verify the $SGE_ROOT and $SGE_CELL variables are set properly host: If the $SGE_ROOT environment variable is not set on execution host, set it by typing:
# SGE_ROOT=/opt/sge; export SGE_ROOT
Always manually confirm that you have set the $SGE_ROOT environment variable correctly. Type:
# echo $SGE_ROOT
Check if files are visible in the NFS-mounted directory and owned by iether root or sgeadmin (depending on what you chose during installation of qmaster)
cd $SGE_ROOT/$SGE_CELL && ls -l
cd $SGE_ROOT/$SGE_CELL && ls -l
On the master host:
On the master host: add the host to the list of execution hosts (this is not strictly necessary)
qconf -aeThe -ae option (add execution host) displays an editor that contains a configuration template for an execution host. The editor is either the default vi editor or the editor that corresponds to the EDITOR environment variable.
In this template you specify the hostname, which should be the name of an execution host we wnat to configure. In VI screen change the name and save the template. See the host_conf(5) man page for a detailed description of the template entries to be changed.
1 hostname template 2 load_scaling NONE 3 complex_values NONE 4 user_lists NONE 5 xuser_lists NONE 6 projects NONE 7 xprojects NONE 8 usage_scaling NONE 9 report_variables NONE
Depending on your distribution you might need to correct a couple of bugs.
There are two installers for Grid engine:
./install_execd -nobincheck
On open source distribution GUI-based installer rarely works, and sometimes is not even shipped with the distributions, so you are forced using command line installer. On commercial distributions GUI installer typically works.
One advantage of GUI based installer is that it allow you to install multiple execution hosts at once. You can achieve the same affect with command line installer by using Expect.
If you have both you have a luxury to decide which installation method is best for you ( GUI installer, is pretty nice). Some considerations.
For detailed, step-by-step instruction see
On the execution host: You need to ensure two things:
Ensure proper environment after reboot by sourcing SGE environment in /etc/profile or copying the file /$SGE_ROOT/default/common/settings.sh to /etc/profile.d/sge.sh
That the execution daemon is registered using chkconfig and start on reboot automatically.
. /$SGE_ROOT/default/common/settings.sh
. $SGE_ROOT/$SGE_CELL/common/settings.sh
# chkconfig sgeexecd.$SGE_CLUSTER_NAME on sgeexecd.p6444 0:off 1:off 2:on 3:on 4:on 5:on 6:off
# service sgeexecd.$SGE_CLUSTER_NAME start starting sge_execd
The first start of the execution daemon takes two-three minutes of more. It's really slow even on a very fast server.
#!/bin/bash # # Post install operations for SGE execution host # . /$SGE_ROOT/default/common/settings.sh # Add sgeexecd.$SGE_CLUSTER_NAME (or whatever is your cluster name) to default services on level 3 and 5 chkconfig sgeexecd.$SGE_CLUSTER_NAME on # On the execution host: start the sge_execd service service sgeexecd.$SGE_CLUSTER_NAME start # add nessesary commands to /etc/profile echo ". /$SGE_ROOT/default/common/settings.sh" >> /etc/profile
On the master host: Specify a queue for this host. That can be done by either adding it to existing queue or copying existing queue, renaming it and saving under new name.
To add a new queue using existing queue as a template use commands
# qconf -sq c32.q > b2.q
hostlist lusprocessors 32slots 32shell /bin/bashpe_list ms
vi m40a.q
qconf -Aq b2.q root@lus17 added "b2.q" to cluster queue list
See Creating and modifying SGE Queues
Verify that the execution host has been declared with the command
qconf -sel
which lists all execution hosts.
You can also use qconf -se <hostname> to see parameters configured (usually
only hostname is configured) See
Configuring Hosts From the Command
Line
ps -ef | grep sge
Specifically, you should see that the sge_execd daemon is running.
/sbin/service sgeexecd.p6444 start
If it starts correctly you might forgot to run chkconfig command.
Grid Engine messages can be found in syslog during startup:
After startup the daemons log their messages in their spool directories.
Qmaster:
$SGE_ROOT/$SGE_DELL/spool/qmaster/messages
Exec daemon:
$SGE_ROOT/$SGE_DELL/spool/$EXEC_HOSTNAME/messages
Where $EXEC_HOSTNAME is hostname of execution host we want to see the messages from.
|
Switchboard | ||||
Latest | |||||
Past week | |||||
Past month |
Nov 08, 2018 | liv.ac.uk
I installed SGE on Centos 7 back in January this year. If my recolection is correct, the procedure was analogous to the instructions for Centos 6. There were some issues with the firewalld service (make sure that it is not blocking SGE), as well as some issues with SSL.
Check out these threads for reference:http://arc.liv.ac.uk/pipermail/sge-discuss/2017-January/001050.html
Max
Aug 17, 2018 | www.rocksclusters.org
Operating System Base
Rocks 7.0 (Manzanita) x86_64 is based upon CentOS 7.4 with all updates available as of 1 Dec 2017.
Building a bare-bones compute clusterBuilding a more complex cluster
- Boot your frontend with the kernel roll
- Then choose the following rolls: base , core , kernel , CentOS and Updates-CentOS
In addition to above, select the following rolls:
Building Custom Clusters
- area51
- fingerprint
- ganglia
- kvm (used for virtualization)
- hpc
- htcondor (used independently or in conjunction with sge)
- perl
- python
- sge
- zfs-linux (used to build reliable storage systems)
If you wish to build a custom cluster, you must choose from our a la carte selection, but make sure to download the required base , kernel and both CentOS rolls. The CentOS rolls include CentOS 7.4 w/updates pre-applied. Most users will want the full updated OS so that other software can be added.
MD5 ChecksumsPlease double check the MD5 checksums for all the rolls you download.
DownloadsAll ISOs are available for downloads from here . Individual links are listed below.
Name Description Name Description kernel Rocks Bootable Kernel Roll required zfs-linux ZFS On Linux Roll. Build and Manage Multi Terabyte File Systems. base Rocks Base Roll required fingerprint Fingerprint application dependencies core Core Roll required hpc Rocks HPC Roll CentOS CentOS Roll required htcondor HTCondor High Throughput Computing (version 8.2.8) Updates-CentOS CentOS Updates Roll required sge Sun Grid Engine (Open Grid Scheduler) job queueing system kvm Support for building KVM VMs on cluster nodes perl Support for Newer Version of Perl ganglia Cluster monitoring system from UCB python Python 2.7 and Python 3.x area51 System security related services and utilities openvswitch Rocks integration of OpenVswitch
Oct 15, 2017 | biohpc.blogspot.com
Installation of Son of Grid Engine(SGE) on CentOS7
SGE Master installation
master# hostnamectl set-hostname qmaster.local
master# vi /etc/hosts
192.168.56.101 qmaster.local qmaster
192.168.56.102 compute01.local compute01master# mkdir -p /BiO/src
master# yum -y install epel-release
master# yum -y install jemalloc-devel openssl-devel ncurses-devel pam-devel libXmu-devel hwloc-devel hwloc hwloc-libs java-devel javacc ant-junit libdb-devel motif-devel csh ksh xterm db4-utils perl-XML-Simple perl-Env xorg-x11-fonts-ISO8859-1-100dpi xorg-x11-fonts-ISO8859-1-75dpi
master# groupadd -g 490 sgeadmin
master# useradd -u 495 -g 490 -r -m -d /home/sgeadmin -s /bin/bash -c "SGE Admin" sgeadmin
master# visudo
%sgeadmin ALL=(ALL) NOPASSWD: ALL
master# cd /BiO/src
master# wget http://arc.liv.ac.uk/downloads/SGE/releases/8.1.9/sge-8.1.9.tar.gz
master# tar zxvfp sge-8.1.9.tar.gz
master# cd sge-8.1.9/source/
master# sh scripts/bootstrap.sh && ./aimk && ./aimk -man
master# export SGE_ROOT=/BiO/gridengine && mkdir $SGE_ROOT
master# echo Y | ./scripts/distinst -local -allall -libs -noexit
master# chown -R sgeadmin.sgeadmin /BiO/gridenginemaster# cd $SGE_ROOT
master# ./install_qmaster
press enter at the intro screen
press "y" and then specify sgeadmin as the user id
leave the install dir as /BiO/gridengine
You will now be asked about port configuration for the master, normally you would choose the default (2) which uses the /etc/services file
accept the sge_qmaster info
You will now be asked about port configuration for the master, normally you would choose the default (2) which uses the /etc/services file
accept the sge_execd info
leave the cell name as "default"
Enter an appropriate cluster name when requested
leave the spool dir as is
press "n" for no windows hosts!
press "y" (permissions are set correctly)
press "y" for all hosts in one domain
If you have Java available on your Qmaster and wish to use SGE Inspect or SDM then enable the JMX MBean server and provide the requested information - probably answer "n" at this point!
press enter to accept the directory creation notification
enter "classic" for classic spooling (berkeleydb may be more appropriate for large clusters)
press enter to accept the next notice
enter "20000-20100" as the GID range (increase this range if you have execution nodes capable of running more than 100 concurrent jobs)
accept the default spool dir or specify a different folder (for example if you wish to use a shared or local folder outside of SGE_ROOT
enter an email address that will be sent problem reports
press "n" to refuse to change the parameters you have just configured
press enter to accept the next notice
press "y" to install the startup scripts
press enter twice to confirm the following messages
press "n" for a file with a list of hosts
enter the names of your hosts who will be able to administer and submit jobs (enter alone to finish adding hosts)
skip shadow hosts for now (press "n")
choose "1" for normal configuration and agree with "y"
press enter to accept the next message and "n" to refuse to see the previous screen again and then finally enter to exit the installermaster# cp /BiO/gridengine/default/common/settings.sh /etc/profile.d/
master# qconf -ah compute01.local
compute01.local added to administrative host listmaster# yum -y install nfs-utils
master# vi /etc/exports
/BiO 192.168.56.0/24(rw,no_root_squash)master# systemctl start rpcbind nfs-server
master# systemctl enable rpcbind nfs-serverSGE Client installation
compute01# yum -y install hwloc-devel
compute01# hostnamectl set-hostname compute01.local
compute01# vi /etc/hosts
192.168.56.101 qmaster.local qmaster
192.168.56.102 compute01.local compute01compute01# groupadd -g 490 sgeadmin
compute01# useradd -u 495 -g 490 -r -m -d /home/sgeadmin -s /bin/bash -c "SGE Admin" sgeadmincompute01# yum -y install nfs-utils
compute01# systemctl start rpcbind
compute01# systemctl enable rpcbind
compute01# mkdir /BiO
compute01# mount -t nfs 192.168.56.101:/BiO /BiO
compute01# vi /etc/fstab
192.168.56.101:/BiO /BiO nfs defaults 0 0compute01# export SGE_ROOT=/BiO/gridengine
compute01# export SGE_CELL=default
compute01# cd $SGE_ROOT
compute01# ./install_execd
compute01# cp /BiO/gridengine/default/common/settings.sh /etc/profile.d/
Mar 02, 2016 | liv.ac.uk
README
This is Son of Grid Engine version v8.1.9.
See <http://arc.liv.ac.uk/repos/darcs/sge-release/NEWS> for information on recent changes. See <https://arc.liv.ac.uk/trac/SGE> for more information.
The .deb and .rpm packages and the source tarball are signed with PGP key B5AEEEA9.
* sge-8.1.9.tar.gz, sge-8.1.9.tar.gz.sig: Source tarball and PGP signature * RPMs for Red Hat-ish systems, installing into /opt/sge with GUI installer and Hadoop support: * gridengine-8.1.9-1.el5.src.rpm: Source RPM for RHEL, Fedora * gridengine-*8.1.9-1.el6.x86_64.rpm: RPMs for RHEL 6 (and CentOS, SL) See < https://copr.fedorainfracloud.org/coprs/loveshack/SGE/ > for hwloc 1.6 RPMs if you need them for building/installing RHEL5 RPMs. * Debian packages, installing into /opt/sge, not providing the GUI installer or Hadoop support: * sge_8.1.9.dsc, sge_8.1.9.tar.gz: Source packaging. See <http://wiki.debian.org/BuildingAPackage> , and see < http://arc.liv.ac.uk/downloads/SGE/support/ > if you need (a more recent) hwloc. * sge-common_8.1.9_all.deb, sge-doc_8.1.9_all.deb, sge_8.1.9_amd64.deb, sge-dbg_8.1.9_amd64.deb: Binary packages built on Debian Jessie. * debian-8.1.9.tar.gz: Alternative Debian packaging, for installing into /usr. * arco-8.1.6.tar.gz: ARCo source (unchanged from previous version) * dbwriter-8.1.6.tar.gz: compiled dbwriter component of ARCo (unchanged from previous version)More RPMs (unsigned, unfortunately) are available at < http://copr.fedoraproject.org/coprs/loveshack/SGE/ >.
I've done this a couple of times by now, and I always forget one step or another. Most of the information is on http://verahill.blogspot.com.au/2012/06/setting-up-sun-grid-engine-with-three.html but here it is in a briefer form:
In the example I've used krypton as the node name, and 192.168.1.180 as the IP.
My front node is called beryllium and has an IP of 192.168.1.1.0. On the front node
Add the new node name to the front node/queue masterAdd execution host
qconf -aewhich opens a text file in vimEdited hostname (krypton) but nothing else. Saving returns
added host krypton to exec host listAdd krypton as a submit hostqconf -as kryptonkrypton added to submit host listDoing this before touching the node makes life a little bit easier.1. Edit /etc/hosts on the node
Leave
127.0.0.1 localhostbut remove127.0.1.1 kryptonand make sure that it says192.168.1.180 kryptoninstead.Throw in
192.168.1.1 berylliumas well.2. Install SGE on node
sudo apt-get install gridengine-exec gridengine-clientYou'll be asked aboutConfigure automatically: yes Cell name: rupert Master hostname: beryllium3. Add node to queue and group
I maintain separate queues and groups depending on how many cores each node has. See e.g. http://verahill.blogspot.com.au/2012/06/setting-up-sun-grid-engine-with-three.html for how to create queues and groups.If they already exits, just do
qconf -aattr hostgroup hostlist krypton @fourcores qconf -aattr queue slots "[krypton=4]" fourcores.qto add the new node.4. Add pe to queue if necessary
Since I have different queues depending on the number of cores of a node, I tend to have to fiddle with this.
See e.g. http://verahill.blogspot.com.au/2012/06/setting-up-sun-grid-engine-with-three.html for how to create pe:s.
If the pe you need is already created, you can do
qconf -mq fourcores.qand edit pe_list
5. Check
On the front node, doqhost
HOSTNAME ARCH NCPU LOAD MEMTOT MEMUSE SWAPTO SWAPUS ------------------------------------------------------------------------------- global - - - - - - - beryllium lx26-amd64 3 0.16 7.8G 5.3G 14.9G 398.2M boron lx26-amd64 6 6.02 7.6G 1.6G 14.9G 0.0 helium lx26-amd64 2 - 2.0G - 1.9G - lithium lx26-amd64 3 - 3.9G - 0.0 - neon lx26-amd64 8 8.01 31.4G 1.3G 59.6G 0.0 krypton lx26-amd64 4 4.01 15.6G 2.8G 14.9G 0.0
This section describes installing the Sun Grid Engine software. These instructions are streamlined for installations particular to the Sun Shared Visualization 1.1 software.
Complete Sun Grid Engine documentation, including an installation guide, is available at:
http://docs.sun.com/app/docs/coll/1017.3
To Prepare to Install the Sun Grid Engine SoftwareThis procedure is for installations on all Solaris and Linux servers.
1. Determine which host is to be the queue master (qmaster) and which host is to be the NFS server for your grid.
If the resources are available, the same host can perform both roles.
2. Determine which hosts are to be the execution hosts for your grid.
If the resources are available and these systems are configured with graphics accelerators, the execution hosts can also be the graphics servers.
Note - Execution hosts need the korn shell, ksh. Solaris hosts include ksh by default, but Linux hosts might need ksh to be installed.
3. Determine your installation root directory.
The package default is /gridware/sge, however the Sun Grid Engine documentation calls this <sge_root> or /sge_root. These instructions use the variable, $SGE_ROOT.
4. Become superuser of the NFS server and declare the variable:
# setenv SGE_ROOT /gridware/sgeIf you chose a different installation root directory in Step 3, type that directory name instead of /gridware/sge.
5. Create the base directory for $SGE_ROOT if the path has multiple directory components:
# mkdir /gridware6. Determine an SGE administrative login that can be used on all systems intended to be administration hosts.
For example, you might plan to use these parameters:
Parameter
Value
Name
sgeadmin
Group
adm (4)
Home directory
$SGE_ROOT
or /gridware/sge (if that is your SGE_ROOT choice)User ID
530
The Sun Grid Engine administrator can have a different user ID than sgeadmin. However, the administrative user ID (530 in this example) must be available across all hosts in the grid.On SuSE hosts, group 4 (adm) might not already be defined in /etc/group. In that case, you need to add that group.
7. Create the sgeadmin user on the NFS server for your grid.
Use the values you selected in Step 6, as in this example::
# useradd -u 530 -g 4 -d $SGE_ROOT -m -s /bin/tcsh -c "Sun Grid Engine Admin" sgeadmin8. Assign the sgeadmin user a password:
# passwd sgeadmin9. Append the following lines to the sgeadmin .cshrc file:
if ( $?prompt == 1 ) then if ( -f /gridware/sge/default/common/settings.csh ) then source /gridware/sge/default/common/settings.csh endif endifReplace /gridware/sge with the value of $SGE_ROOT if different.
Step 9 , as the variable will not be set in a fresh shell until the settings.csh file is sourced.
You might choose to do the same for root's .cshrc or .tcshrc, or the equivalent file for root's shell.10. Continue the installation of software on the NFS server by performing one of these procedures:
To Install the Software on a Linux System1. Permit $SGE_ROOT to be shared (exported) by the NFS server.
If your base directory of $SGE_ROOT is already shared, you do not need to perform this step.
On the Linux NFS server, append the following line to the /etc/exports file:
/gridware *(rw,sync,no_root_squash)where /gridware is the base directory of your $SGE_ROOT.
2. Inform the operating system of the changes you have made:
- For SuSE Linux:
# /etc/init.d/nfs*server stop ; /etc/init.d/nfs*server start- For Red Hat Linux:
# /etc/init.d/nfs restart3. If the system automounts using the hosts map, you can test the accessibility of the $SGE_ROOT directory from other systems on the network with this command:
# ls /net/nfsserverhostname/$SGE_ROOT4. From each server in the grid, access the NFS server's $SGE_ROOT as each server's $SGE_ROOT using /etc/vfstab, /etc/fstab, or automounting.
Note - Submit hosts (client machines) also need to mount the NFS server's $SGE_ROOT.Execution hosts must not mount the NFS server with the nosuid option, as setuid is needed by Sun Grid Engine's rlogin and rsh for its qrsh command to work properly.
a. Add the following line to the /etc/fstab file:
nfsserverhostname:/gridware /gridware nfs auto,suid,bg,intr 0 0Your Linux system might also need the no_root_squash option in this line.
b. Type these two commands:
# mkdir /gridware # mount /gridwarewhere /gridware is the base directory of your $SGE_ROOT.
Note - If you use NIS to resolve host names, add the server's name to the /etc/hosts file and ensure that files is in the hosts entry in the /etc/nsswitch.conf file. Mounting occurs before the NIS name service is started. The first hostname on the /etc/hosts line for the execution host itself should not include a domain.
5. Determine port numbers.
You must determine an available port on the qmaster system. Sun Grid Engine components will use this port to communicate with the qmaster daemon. This port must be a single port number that is available on all current or prospect submit and execution hosts in your grid.
These port numbers can be any value, but the following port numbers have been assigned by the Internet Assigned Number Authority (IANA):
- sge_qmaster 6444/tcp
- sge_execd 6445/tcp
Note - For more information about IANA, see:
http://www.iana.org/assignments/port-numbers
If you are running a firewall on any execution host, ensure that the execution daemon's port allows traffic in.6. Communicate the port numbers to the hosts.
These port numbers can be communicated to the hosts involved either by inserting the port numbers into every host's /etc/inet/services or /etc/services file or by setting Sun Grid Engine environment variables. The latter method, detailed in Step 4 of To Complete the Software Installation, is more convenient, because each Sun Grid Engine user already needs to use a Sun Grid Engine environment setup file. If you allow Sun Grid Engine to use this setup file, you will not have to add sge entries into every host's services file.
To use this environment variable technique, set these environment variables before you invoke ./install_qmaster in Step 2 of To Complete the Software Installation. Use the port numbers determined in Step 5 in place of 6444 and 6445 in these commands:
# setenv SGE_QMASTER_PORT 6444 # setenv SGE_EXECD_PORT 6445The lines you include in the setup file for Sun Grid Engine will be executed by Step 5 of To Complete the Software Installation. (After installation, you will need to ensure that the setup file's set and export environment variables are naming SGE_QMASTER_PORT and SGE_EXECD_PORT.)
7. As superuser of the NFS server, install the Sun Grid Engine packages into $SGE_ROOT.
The NGS server will need both Sun Grid Engine architecture-independent common files and architecture-dependent files for the architecture of every submit and execution host. (Each architecture is a pairing of processor instruction set and operating system.) You might also choose to install documentation files.
These files can be installed from RPM packages on a Linux system. Files for additional nonnative architectures need to be installed from tar bundles, which is explained in Step 1 in To Complete the Software Installation.
Refer to TABLE 3-2, which lists commonly used Sun Grid Engine 6.1 Linux software RPM packages and the download files that contain those packages. If you are installing a release other than Sun Grid Engine 6.1, the download file names will refer to that version instead of reading 6_1. Also, newer versions of Sun Grid Engine might use file names that say sge instead of n1ge.
TABLE 3-2 Sun Grid Engine 6.1 Linux Software RPM Packages Application
RPM Package
Description
Common
sun-nlge-common-6.1.0.noarch.rpm
Sun Grid Engine architecture-independent common files, including documentation files
X64
sun-nlge-bin-linux24-x64-6.1.0.x86_64.rpm
Linux kernel 2.4 or 2.6, glibc >= 2.3.2, for AMD Opteron or Intel EM64T
X86
sun-nlge-bin-linux24-i586-6.1.0.i386.rpm
Linux kernel 2.4 or 2.6, glibc >= 2.3.2, for 32-bit x86
Common but Optional
sun-nlge-arco-6.1.0.noarch.rpm
Accounting and Reporting Console (ARCo) for all architectures, not needed for the core product (optional).
To install each of the RPM packages you selected, type an rpm command line such as this:
# rpm -iv /path-to-rpm-file/sun-nlge-rest-of-filename.rpm8. Perform the steps in To Complete the Software Installation.
To Complete the Software InstallationThis procedure is for installations on all Solaris and Linux servers.
1. Install additional Sun Grid Engine tar bundles of files needed by hosts with a different operating system than the NFS server.
TABLE 3-3 lists Sun Grid Engine 6.1 software tar bundles, which can install nonnative software on a Solaris or Linux NFS server. Use these bundles to install software on an NFS server as needed to support hosts with a different operating system. (Newer versions of Sun Grid Engine might use file names that say sge instead of n1ge.)
TABLE 3-3 Sun Grid Engine 6.1 Software tar Bundles Name of tar File Bundle
Description
nlge-common.tar.gz
Architecture independent files (required, but was already installed from packages on the NFS server)
nlge-6_1-bin-linux24-amd64.tar.gz
Linux kernel 2.4 or 2.6, glibc >= 2.3.2, for AMD Opteron and Intel EM64T
nlge-6_1-bin-linux24-i586.tar.gz
Linux kernel 2.4 or 2.6, glibc >= 2.2.5, for 32-bit x86
nlge-6_1-bin-solaris-sparcv9.tar.gz
Solaris 8 and higher, for 64-bit SPARC
nlge-6_1-bin-solaris-i586.tar.gz
Solaris 9 and higher, for 32-bit x86
nlge-6_1-bin-solaris-x64.tar.gz
Solaris 10, for 64-bit x64 (such as AMD Opteron)
nlge-6_1-bin-windows-x86.tar.gz
Microsoft Windows[1]
nlge-6_1-arco.tar.gz
Accounting and Reporting Console (ARCo) for all architectures, not needed for the core product
swc_linux_2.2.5.tar.gz
Sun Web Console, required for ARCo, Linux, for 32-bit x86
swc_solx86_2.2.5.tar.gz
Sun Web Console, required for ARCo, Solaris, for x86
swc_sparc_2.2.5.tar.gz
Sun Web Console, required for ARCo, Solaris, for 64-bit SPARC
After you download the additional software you need, you can install the contents of each tar.gz file in the $SGE_ROOT directory with a command such as this:
# gunzip -c nlge-6_1-platform.tar.gz | (cd $SGE_ROOT; tar xf -)If you installed any of the tar bundles mentioned in this step, you will need to answer n when the installation script asks (as in Step 3):
Did you install this version with >pkgadd< or did you already verify and set the file permissions of your distribution (enter: y)2. On the queue master host, type:
# cd $SGE_ROOT ; ./install_qmasterThe Sun Grid Engine installation script begins.
3. The script prompts you for information and requests confirmation of selected values.
As you progress through the script, consider the following:
- The Sun N1 Grid Engine 6 Installation Guide has a table to help plan and record the answers to the questions asked during installation. For the simplest installation, accept all the defaults not discussed in the following text, unless your $SGE_ROOT is not /gridware/sge.
- The installation script asks: "Do you want to install Grid Engine as admin user >sgeadmin<? (y/n)". Answer y, so that all spool files are created as owned by that user. This answer avoids a problem where an execution host's root becomes nobody over NFS and therefore cannot access the spooling directories.
Note - The installation script might instead ask this question: "Do you want to install Grid Engine under a user id other than >root<? (y/n) [y]". Answer y. Later, you are asked for the user ID, which can be sgeadmin (as created in Step 6 of To Prepare to Install the Sun Grid Engine Software).
- The installation script asks: "Did you install this version with >pkgadd< or did you already verify and set the file permissions of your distribution (enter: y)". If you installed exclusively from packages, answer y. If you installed even partially from tar files (as in Step 1) or other means, answer n, and the install_qmaster script sets the file permissions appropriately.
- The installation script asks: "Are all hosts of your cluster in a single DNS domain (y/n)". Unless you are certain that you need domain checking, answer y. Sun Grid Engine then ignores domain components when comparing hostnames.
Execution hosts and the queue master must agree on the primary name of the execution host. If the execution host and the queue master do not agree on hostnames, a host_aliases file in the $SGE_ROOT directory enables SGE to understand that certain names are equivalent. For example, a host_aliases file might include this line:
myhost1 my1 myhost1-ib my1-ibEvery host name on this line is considered equivalent to the first name on the line (myhost1), which is the primary host name. For more details, see the Sun Grid Engine man page for host_aliases (5).
In addition, Sun Grid Engine requires that a host's unique hostname is associated with a true IP address, not the localhost address 127.0.0.1.
- Select to use the BerkeleyDB, but do not configure a separate BerkeleyDB server.
- If your site uses NIS, a usable group ID range can be determined by studying the output of:
# ypcat -k group.bygid | sort -n | moreOr, ask your administrator for a reasonable range of unused group IDs. Sun Grid Engine uses the group IDs for each of the parallel jobs that are running at a given time.
- When prompted for administrative and submit hosts, include the name of the queue master host as an administrative and submit host, unless you forbid submissions from that host.
- You can create a shadow host that takes over for the qmaster if it becomes unavailable. This action is optional.
- Use the following command to add administrative hosts (which might be configured to be execution hosts) if those hosts were omitted:
# qconf -ah hostname, anotherhost- You can display the administrative host list by typing:
# qconf -sh- You can add submit hosts by typing:
# qconf -as myhost,anotherhost,stillmore- Typing the following displays the submit host list:
qconf -ss4. Update environment variables in settings files.
If you decided to communicate the port numbers to all SGE hosts using SGE's environment setup file, you now need to assure that SGE sets the correct port numbers for environment variables SGE_QMASTER_PORT and SGE_EXECD_PORT. (You would have made that choice at Step 6 of To Install the Software on a Solaris System or Step 6 of To Install the Software on a Linux System, and would have determined the port numbers in the step before these steps.)
You might find that the proper variable values were written when you ran install_qmaster.
a. Edit the SGE settings file for csh or tcsh.
The file is $SGE_ROOT/default/common/settings.csh.
b. In the settings.csh file, look for lines such as these:
unsetenv SGE_QMASTER_PORT unsetenv SGE_EXECD_PORTIf you find such lines, change them to use your port numbers.
You determined the port numbers in Step 5 of To Install the Software on a Solaris System or Step 5 of To Install the Software on a Linux System. For example, change the lines to the following:
setenv SGE_QMASTER_PORT 6444 setenv SGE_EXECD_PORT 6445c. Edit the SGE settings file for sh, bash, and ksh.
The file is $SGE_ROOT/default/common/settings.sh
d. In the settings.sh file, look for lines such as these:
unset SGE_QMASTER_PORT unset SGE_EXECD_PORTIf you find such lines, change them to use your port numbers.
For example, change the lines to the following:
SGE_QMASTER_PORT=6444; export SGE_QMASTER_PORT SGE_EXECD_PORT=6445; export SGE_EXECD_PORTThe settings files contain the lines to unset these environment variables by default. This default behavior is desirable if you if had instead decided to enter the port numbers in every SGE host's /etc/services or /etc/inet/services file.
5. Source the file to set up your environment to use Sun Grid Engine.
- For tcsh/csh users, type:
% source /gridware/sge/default/common/settings.cshSubstitute /gridware/sge with your value of $SGE_ROOT. Consider having root's .login do so.
- For sh/bash/ksh users, type:
$ . /gridware/sge/default/common/settings.shSubstitute /gridware/sge with the $SGE_ROOT. Consider having root's .profile or .bashrc do so.
6. Create the sgeadmin user on each of the other administration hosts of the grid:
# useradd -u 530 -g 4 -d $SGE_ROOT -s /bin/tcsh -c "Sun Grid Engine Admin" sgeadmin
Note - Unlike Step 7 of To Prepare to Install the Sun Grid Engine Software, the -m option is not needed for these other administration hosts. Assign the sgeadmin a password, as in Step 8 of that procedure.
Alternatively, you can add the sgeadmin entries to the respective /etc/passwd and /etc/shadow files.7. As superuser on every execution host, set the SGE_ROOT environment variable and then type:
# cd $SGE_ROOT ; ./install_execdYou might need to create the execution host's default spooling directory. As superuser on the NFS server, type:
# mkdir $SGE_ROOT/default/spool/exec-hostnameThe same value for exec-hostname is needed in the procedure To Set Up Sun Grid Engine Environment Variables
8. After the environment is set up, submit a test job.
To specify the job to execute on your host:
exechost% qsub -q all.q@'hostname' $SGE_ROOT/examples/jobs/simple.sh exechost% qstat -fJob output and errors are in the initiating user's home directory, with filenames similar to the following:
simple.sh.e1 simple.sh.o1
Note - If you run the job as root, these files are in the execution host's root directory. If you do not know which host executed the job, you do not know which root directory the files are in. Therefore, submit jobs as a user whose home directory is in one place irrespective of execution host or specify the execution hostname explicitly.
To Set Up Sun Grid Engine Environment VariablesUse one of the following commands:
- For tcsh and csh users, type:
% source /gridware/sge/default/common/settings.cshSubstitute /gridware/sge with your $SGE_ROOT.
- For sh, bash, and ksh users, type:
$ . /gridware/sge/default/common/settings.shSubstitute /gridware/sge with your $SGE_ROOT.
Note - These commands add $SGE_ROOT/bin/$ARCH to $path, add $SGE_ROOT/man to $MANPATH, set $SGE_ROOT, and if needed set $SGE_CELL to $COMMD_PORT.
Messages from Sun Grid Engine can be found in:
- /tmp/qmaster_messages (during Sun Grid Engine queue master startup)
- /tmp/execd_messages (during Sun Grid Engine exec daemon startup)
After the startup the daemons log messages in the spool directories.
- Sun Grid Engine queue master:
$SGE_ROOT/default/spool/qmaster/messages
- Sun Grid Engine execution daemon:
$SGE_ROOT/default/spool/exec-hostname/messages
To Verify Your Administrative Hosts# qconf -shTo Add Administrative Hostsqconf -ah hostnameTo Obtain Current Statusqstat -f
Note - In the status display, BIP means that queue permits batch, interactive, and parallel jobs. Also, the status au means the execution host daemon (execd) is not successfully running and communicating with the qmaster process.
Google matched content |
Society
Groupthink : Two Party System as Polyarchy : Corruption of Regulators : Bureaucracies : Understanding Micromanagers and Control Freaks : Toxic Managers : Harvard Mafia : Diplomatic Communication : Surviving a Bad Performance Review : Insufficient Retirement Funds as Immanent Problem of Neoliberal Regime : PseudoScience : Who Rules America : Neoliberalism : The Iron Law of Oligarchy : Libertarian Philosophy
Quotes
War and Peace : Skeptical Finance : John Kenneth Galbraith :Talleyrand : Oscar Wilde : Otto Von Bismarck : Keynes : George Carlin : Skeptics : Propaganda : SE quotes : Language Design and Programming Quotes : Random IT-related quotes : Somerset Maugham : Marcus Aurelius : Kurt Vonnegut : Eric Hoffer : Winston Churchill : Napoleon Bonaparte : Ambrose Bierce : Bernard Shaw : Mark Twain Quotes
Bulletin:
Vol 25, No.12 (December, 2013) Rational Fools vs. Efficient Crooks The efficient markets hypothesis : Political Skeptic Bulletin, 2013 : Unemployment Bulletin, 2010 : Vol 23, No.10 (October, 2011) An observation about corporate security departments : Slightly Skeptical Euromaydan Chronicles, June 2014 : Greenspan legacy bulletin, 2008 : Vol 25, No.10 (October, 2013) Cryptolocker Trojan (Win32/Crilock.A) : Vol 25, No.08 (August, 2013) Cloud providers as intelligence collection hubs : Financial Humor Bulletin, 2010 : Inequality Bulletin, 2009 : Financial Humor Bulletin, 2008 : Copyleft Problems Bulletin, 2004 : Financial Humor Bulletin, 2011 : Energy Bulletin, 2010 : Malware Protection Bulletin, 2010 : Vol 26, No.1 (January, 2013) Object-Oriented Cult : Political Skeptic Bulletin, 2011 : Vol 23, No.11 (November, 2011) Softpanorama classification of sysadmin horror stories : Vol 25, No.05 (May, 2013) Corporate bullshit as a communication method : Vol 25, No.06 (June, 2013) A Note on the Relationship of Brooks Law and Conway Law
History:
Fifty glorious years (1950-2000): the triumph of the US computer engineering : Donald Knuth : TAoCP and its Influence of Computer Science : Richard Stallman : Linus Torvalds : Larry Wall : John K. Ousterhout : CTSS : Multix OS Unix History : Unix shell history : VI editor : History of pipes concept : Solaris : MS DOS : Programming Languages History : PL/1 : Simula 67 : C : History of GCC development : Scripting Languages : Perl history : OS History : Mail : DNS : SSH : CPU Instruction Sets : SPARC systems 1987-2006 : Norton Commander : Norton Utilities : Norton Ghost : Frontpage history : Malware Defense History : GNU Screen : OSS early history
Classic books:
The Peter Principle : Parkinson Law : 1984 : The Mythical Man-Month : How to Solve It by George Polya : The Art of Computer Programming : The Elements of Programming Style : The Unix Hater’s Handbook : The Jargon file : The True Believer : Programming Pearls : The Good Soldier Svejk : The Power Elite
Most popular humor pages:
Manifest of the Softpanorama IT Slacker Society : Ten Commandments of the IT Slackers Society : Computer Humor Collection : BSD Logo Story : The Cuckoo's Egg : IT Slang : C++ Humor : ARE YOU A BBS ADDICT? : The Perl Purity Test : Object oriented programmers of all nations : Financial Humor : Financial Humor Bulletin, 2008 : Financial Humor Bulletin, 2010 : The Most Comprehensive Collection of Editor-related Humor : Programming Language Humor : Goldman Sachs related humor : Greenspan humor : C Humor : Scripting Humor : Real Programmers Humor : Web Humor : GPL-related Humor : OFM Humor : Politically Incorrect Humor : IDS Humor : "Linux Sucks" Humor : Russian Musical Humor : Best Russian Programmer Humor : Microsoft plans to buy Catholic Church : Richard Stallman Related Humor : Admin Humor : Perl-related Humor : Linus Torvalds Related humor : PseudoScience Related Humor : Networking Humor : Shell Humor : Financial Humor Bulletin, 2011 : Financial Humor Bulletin, 2012 : Financial Humor Bulletin, 2013 : Java Humor : Software Engineering Humor : Sun Solaris Related Humor : Education Humor : IBM Humor : Assembler-related Humor : VIM Humor : Computer Viruses Humor : Bright tomorrow is rescheduled to a day after tomorrow : Classic Computer Humor
The Last but not Least Technology is dominated by two types of people: those who understand what they do not manage and those who manage what they do not understand ~Archibald Putt. Ph.D
Copyright © 1996-2021 by Softpanorama Society. www.softpanorama.org was initially created as a service to the (now defunct) UN Sustainable Development Networking Programme (SDNP) without any remuneration. This document is an industrial compilation designed and created exclusively for educational use and is distributed under the Softpanorama Content License. Original materials copyright belong to respective owners. Quotes are made for educational purposes only in compliance with the fair use doctrine.
FAIR USE NOTICE This site contains copyrighted material the use of which has not always been specifically authorized by the copyright owner. We are making such material available to advance understanding of computer science, IT technology, economic, scientific, and social issues. We believe this constitutes a 'fair use' of any such copyrighted material as provided by section 107 of the US Copyright Law according to which such material can be distributed without profit exclusively for research and educational purposes.
This is a Spartan WHYFF (We Help You For Free) site written by people for whom English is not a native language. Grammar and spelling errors should be expected. The site contain some broken links as it develops like a living tree...
|
You can use PayPal to to buy a cup of coffee for authors of this site |
Disclaimer:
The statements, views and opinions presented on this web page are those of the author (or referenced source) and are not endorsed by, nor do they necessarily reflect, the opinions of the Softpanorama society. We do not warrant the correctness of the information provided or its fitness for any purpose. The site uses AdSense so you need to be aware of Google privacy policy. You you do not want to be tracked by Google please disable Javascript for this site. This site is perfectly usable without Javascript.
Last modified: March 12, 2019