The utility pdsh can run multiple remote commands in parallel. pdsh uses a "sliding
window" (or fanout) of threads to conserve resources on the initiating host while allowing some
connections to time out. The current version is 2.34 (2020-02-07) Can be downloaded from GitHub:
chaos-pdsh- A high performance, parallel remote shell utility
The pdsh distribution also contains:
A parallel remote copy utility (pdcp
- copy from local host to a group of remote hosts in parallel),
Reverse parallel remote copy (rpdcp,
copy from a group of hosts to localhost in parallel),
The Perl script dshbak for formatting and
demultiplexing pdsh output. the script was initially written in University of California and enhanced in Lawrence Livermore National
Lab.
The script dshbak is important as by default the utility mix output from different hosts. so you should always use
pdsh ... | dshbak
Unless output is a single line always use the script for post processing pdsh output
Pdsh is written in C, and licensed under GPL 2.0. Originally it was a rewrite of IBM dsh(1)
by Jim Garlick <[email protected]> on LLNL's ASCI Blue-Pacific
IBM SP system. Essentially it is a variant of the rsh(1) command, adapted for multiple
target hosts. It is now became popular on HPC clusters and used within scripts in Perl or Python can beat Ansible
in a lot of tasks. It is simpler and can accommodate tasks that do not fit waterflow pattern, which Ansible as a derivative of
IBM JCL adapted for parallel execution on multiple hosts enforces on the users.
It uses a sliding window of threads to
execute remote commands, conserving socket resources while allowing some connections to timeout if needed.
It is high performance tool suitable for very large HPC clusters. It was actively supported until Oct 2013.
RPM is by default included in RHEL and all its derivatives but needs to be installed.
It is generally is more common
for large HPC cluster environments them regular datacenters. But its basic functionally serves the needs
of any Unix/linux sysadmin as for parallel execution of commands on multiple nodes and coping file from
and to the headnode (using pdcp
and rpfcp). Functionality is very similar to C3 Toolsand other tools, but it has
unique options are valuable for very large cluster. Because
it is written in C its maintenance is more difficult and chances of became open source abandonware are
higher. For Python-base rewrite seeclustershell
Pdsh also implements dynamically loadable modules for extended functionality such as new remote shell
services and remote host selection.
NOTE:
Output from each host is displayed as it is received (which means records from different hosts will be mixed) and is
prefixed with the name of the host and a ':' character, unless the -N option is used. Output is unsorted
and you need to sort it to get information for each node based on hostname. Unix sort is not stable so individual records can be
put in wrong order this way. So, as mentioned before, you should use Perl script dshbak provided in pdsh
distribution.
The standard options used to set the target hosts in pdsh are options -w
and -x, which set and exclude hosts respectively. More complex definition can be created using so called gender database
activated with option
-g.
NOTE: it is not necessary to use pdsh with genders database. Other methods are also available and are
powerful enough to satisfy most but the most complex sysadmin needs.
The -w option is used to set and/or filter the list of target hosts, and is used as
-w TARGETS...
where TARGETS is a comma-separated list of the one or more of the following:
Normal host names, e.g. -w host0,host1,host2...
A single '-' character, in which case the list of hosts will be read on STDIN.
using option -w you can specify lists of hosts the general form: prefix[n-m,l-k,...], where n < m and l < k, etc., as an
alternative to explicit lists of hosts. This form should not be confused with regular expression character classes (also denoted by
''[]''). For example, foo[19] does not represent an expression matching foo1 or foo9, but rather represents the hostlist foo19.
The hostlist syntax used in option -w is meant only as a convenience on clusters with a "prefixNNN" naming convention
and specification of ranges should not be considered necessary -- this foo1,foo9 could be specified
as such, or by the hostlist foo[1,9].
Some examples of usage:
Run command on foo01,foo02,...,foo05
pdsh -w foo[01-05] command
Run command on foo7,foo9,foo10
pdsh -w foo[7,9-10] command
Run command on foo0,foo4,foo5
pdsh -w foo[0-5] -x foo[1-3] command
A suffix on the hostname is also supported:
Run command on foo0-eth0,foo1-eth0,foo2-eth0,foo3-eth0
pdsh -w foo[0-3]-eth0 command
NOTE: some shells will interpret brackets ('[' and ']') for pattern matching.
Depending on your shell, it may be necessary to enclose ranged lists within quotes. For example,
in tcsh, the first example above should be executed as:
pdsh -w "foo[01-05]" command
A range of hosts specified in HOSTLIST format. The hostlist format is an optimization
for clusters of hosts that contain a numeric suffix. The range of target hosts is specified in brackets
after the hostname prefix, e.g
If any argument is preceded by a single '^' character, then the argument is taken
to be the path to a file containing a list of hosts, one per line.
Read hosts from /tmp/hosts
pdsh -w ^/tmp/hosts ...
Also works for multiple files:
pdsh -w ^/tmp/hosts,^/tmp/morehosts ...
If the item begins with a '/' character, then it is a regular expression on which
to filter the list of hosts. (The regex argument may be optionally followed by a trailing '/',
e.g. '/node.*/').
Select only hosts ending in a 0 via regex:
pdsh -w host[0-20],/0$/ ...
If any host, hostlist, filename, or regex item is preceded by a '-' character, then
these hosts are excluded instead of including them.
Run on all hosts (-a) except host0:
pdsh -a -w -host0 ...
Exclude all hosts ending in 0:
pdsh -a -w -/0$/ ...
Exclude hosts in file /tmp/hosts:
pdsh -a -w -^/tmp/hosts ...
Additionally, a list of hosts preceeded by "user@" specifies a remote username other than the default
for these hosts. , and list of hosts preceeded by "rcmd_type:" specifies an alternate rcmd connect type
for the following hosts. If used together, the rcmd type must be specified first, e.g. ssh:user1@host0
would use ssh to connect to host0 as user1.
Run with user `foo' on hosts h0,h1,h2, and user `bar' on hosts h3,h5:
pdsh -w foo@h[0-2],bar@h[3,5] ...
Use ssh and user "u1" for hosts h[0-2]:
pdsh -w ssh:u1@h[0-2] ...
Note: If using the genders module, the rcmd_type for groups of hosts can be encoded
in the genders file using the special pdsh_rcmd_type attribute
The -x option is used to exclude specific hosts from the target node list and is used simply
as
-x TARGETS...
This option may be used with other node selection options such as -a and -g (when available).
Arguments to -x may also be preceded by the filename ('^') and regex ('/') characters as described
above. Also as with -w, the -x option also operates on HOSTLISTS.
Exclude hosts ending in 0:
pdsh -a -x /0$/ ...
Exclude hosts in file /tmp/hosts:
pdsh -a -x ^/tmp/hosts ...
Run on hosts node1-node100, excluding node50:
pdsh -w node[1-100] -x node50 ...
As an alternative to "-w ^file", and for backwards compatibility with DSH, a file containing
a list of hosts to target may also be specified in theWCOLLenvironment variable,
in which case pdsh behaves just as if it were called as
Genders is a simple text database of hosts ( by default
located at /etc/genders
) that main purpose of which is to provide labels for the lists of hosts.
It which is a text file which provide set of node lists a labels for each set (for example all).
Label are essentially nicknames for node lists and as such are are a very convient tool:
cnode[101-137],lmain all
cnode[101-116] category=default
cnode[117-120] largememory
lmain headnode
cnode[121-136] blades
After that you can usethe option -g blades to identify nodes from cnode121
to cnode136 with the label blades. See Examples below for use of this option.
Each line of the genders file has one of the following formats. See the section HOST RANGES below
for information on host range formatting.
The nodename(s) are the shortened hostnames of a node. This is followed by any number of spaces
or tabs, and then the comma-separated list of attributes, each of which can optionally have a value.
If the attribute does not have a value it serves as the grouplabel for this group of hosts.
Nodenames can include servers that were included in other nodenames. However, no single node may
have duplicate attributes.
Genders is a static cluster configuration database used for cluster configuration management.
It is used by a variety of tools and scripts for management of large clusters. The genders database
is typically replicated on every node of the cluster. It describes the layout and configuration of
the cluster so that tools and scripts can sense the variations of cluster nodes. By abstracting this
information into a plain text file, it becomes possible to change the configuration of a cluster
by modifying only one file.
When pdsh receives SIGINT (ctrl-C), it lists the status of current threads.
A second SIGINT
within one second terminates the program.
Pending threads may be canceled by issuing ctrl-Z within one
second of ctrl-C. Pending threads are those that have not yet been initiated, or are still in the process
of connecting to the remote host.
If a remote command is not specified on the command line, pdsh runs interactively, prompting
for commands and executing them when terminated with a carriage return. In interactive mode, target
nodes that time out on the first command are not contacted for subsequent commands, and commands prefixed
with an exclamation point will be executed on the local system.
For example
[Ctrl-C]
pdsh@hpc137: interrupt (one more within 1 sec to abort)
pdsh@hpc137: (^Z within 1 sec to cancel pending threads)
pdsh@hpc137: hpc0: connecting
pdsh@hpc137: hpc1: command in progress
pdsh@hpc137: hpc2: command in progress
pdsh@hpc137: hpc3: connecting
pdsh@hpc137: hpc4: connecting
...
Another Ctrl-C within one second will cause pdsh to abort immediately, while
a Ctrl-Z within a second will cancel all pending threads, allowing threads that are connecting and "in
progress" to complete normally.
[Ctrl-C Ctrl-Z]
pdsh@hpc137: interrupt (one more within 1 sec to abort)
pdsh@hpc137: (^Z within 1 sec to cancel pending threads)
pdsh@hpc137: hpc0: connecting
pdsh@hpc137: hpc1: command in progress
pdsh@hpc137: hpc2: command in progress
pdsh@hpc137: hpc3: connecting
pdsh@hpc137: hpc4: connecting
pdsh@hpc137: hpc5: connecting
pdsh@hpc137: hpc6: connecting
pdsh@hpc137: Canceled 8 pending threads.=
Here is a simple example:
pdsh -av 'grep . /proc/sys/kernel/ostype'
ehype78: Linux
ehype79: Linux
ehype76: Linux
ehype85: Linux
ehype77: Linux
ehype84: Linux
...
When pdsh receives a Ctrl-C (SIGINT), it will list the status of all current threads
[Ctrl-C]
pdsh@hype137: interrupt (one more within 1 sec to abort)
pdsh@hype137: (^Z within 1 sec to cancel pending threads)
pdsh@hype137: hype0: connecting
pdsh@hype137: hype1: command in progress
pdsh@hype137: hype2: command in progress
pdsh@hype137: hype3: connecting
pdsh@hype137: hype4: connecting
...
Another Ctrl-C within one second will cause pdsh to abort immediately, while a Ctrl-Z within a second
will cancel all pending threads, allowing threads that are connecting and "in progress" to complete normally.
[Ctrl-C Ctrl-Z]
pdsh@hype137: interrupt (one more within 1 sec to abort)
pdsh@hype137: (^Z within 1 sec to cancel pending threads)
pdsh@hype137: hype0: connecting
pdsh@hype137: hype1: command in progress
pdsh@hype137: hype2: command in progress
pdsh@hype137: hype3: connecting
pdsh@hype137: hype4: connecting
pdsh@hype137: hype5: connecting
pdsh@hype137: hype6: connecting
pdsh@hype137: Canceled 8 pending threads.
The structure of output
Output from each host is displayed as it is received (which means records from different hosts will be mixed) and is prefixed with the
name of the host and a ':' character, unless the -N option is used. Output is unsorted and you need to
sort it to get information for each node based on hostname. Unix sort is not stable so individual records can be put in wrong order
this way. So, as mentioned before you should use perl scipt dshbak
At a minimum pdsh requires a list of remote hosts to target and a remote
command. The standard options used to set the target hosts in pdsh are -w
and -x, which set and exclude hosts respectively. Gender database is using with option
-g.
# export PDSH_RCMD_TYPE=ssh # To override rsh and make ssh the default
pdsh -w ssh:host[0-10] # host0,host1,host2,...host10
pdsh -w ssh:host[0-2,10] # host0,host1,host2,host10
pdsh -w ^/tmp/hosts ... # Read hosts from /tmp/hosts
pdsh -w host[0-20],/0$/ # only hosts ending in a 0 via regex pdsh -a -w -host0 ... # Run on all hosts (-a) except host0 pdsh -a -w -^/tmp/hosts ... # Exclude hosts in file /tmp/hosts
pdsh -a -x ^/tmp/hosts ... # Same - Exclude hosts in file /tmp/hosts
pdsh -w node[1-100] -x node50 ... # Run on hosts node1-node100, excluding node50
export WCOLL=/tmp/hosts # WCOLL env variable containing a list of hosts to target
pdsh hostname # running ‘hostname’ on hosts from WCOLL
There are other pdsh modules that provide options for
creating a list of remote hosts. This is actually a pretty slick idea as if you use scheduler
you queries represent the list of hosts that can be reused. The action on those option depends on
which modules is loaded. Available modes are documented in the
Miscellaneous Modules page.
For example, you can use options
-a to target all hosts listed in genders database (if
genders module is loaded) except those with the "pdsh_all_skip" attribute. This is
shorthand for running "pdsh -A -X pdsh_all_skip ..."
-g to target
groups of hosts in genders, dshgroups, and netgroups databases depending on which module is loaded.
-j to target hosts defined in queues used by iether SLURM or Torque/PBS
(torque module or Slurm module should be loaded)
Output from each host is displayed as it is received and is prefixed with the name of the host and
a ':' character, unless the -N option is used.
pdsh -av 'grep . /proc/sys/kernel/ostype'
hpc78: Linux
hpc79: Linux
hpc76: Linux
hpc85: Linux
hpc77: Linux
hpc84: Linux
...
You can also use separate external query program called nodeattr for queries in gender
database.
Here are some examples from the manpage:
Retrieve a comma separated list of all login and management nodes:
nodeattr -c "login||mgmt"
Retrieve a comma separated list of all login nodes with 4 cpus:
nodeattr -c "login&&cpus=4"
Retrieve a comma separated list of all nodes that are not login or management nodes:
nodeattr -c "~(login||mgmt)"
To use nodeattr with pdsh to run a command on all login nodes:
pdsh -w `nodeattr -c login` command
To use nodeattr in a ksh script to collect a list of users on login nodes:
for i in `nodeattr -n login`; do rsh $i who; done
To verify whether or not this node is a head node:
nodeattr head && echo yes
To verify whether or not this node is a head node and ntpserver:
nodeattr -Q "head&&ntpserver" && echo yes
FILES
/etc/genders
As you can see this is an overkill and makes sense only for very large clusters. But any program
that generates comma delimited list of nodes can be used with pdsh option -w. That means that you can
use /etc/hosts as your database too.
Output a usage message and quit. A list of available rcmd modules will also be printed at
the end of the usage message. The available options for pdsh may change
based on which modules are loaded or passed to the -M option.
-S
Return the largest of the remote command return values
-b
Batch mode. Disables the ctrl-C status feature so that a single ctrl-c kills pdsh.
-lUSER
Run remote commands as user USER. The remote username may also be specified using the
USER@TARGETS syntax with the -w option
-tSECONDS
Set a connect timeout in seconds. Default is 10.
-uSECONDS
Set a remote command timeout. The default is unlimited.
-fFANOUT
Set the maximum number of simultaneous remote commands to FANOUT. The default is
32, but can be overridden at build time.
-N
Disable the hostname: prefix on lines of pdsh output.
-V
Output pdsh version information, along with a list of currently loaded
modules, and exit.
The list of available options is determined at runtime by supplementing the list of standard pdsh
options with any options provided by loaded rcmd and misc modules. In some cases, options
provided by modules may conflict with each other. In these cases, the modules are incompatible and the
first module loaded wins.
Target and or filter the specified list of hosts. Do not use with any other node selection options
(e.g. -a, -g, if they are available). No spaces are allowed in the comma-separated
list. Arguments in the TARGETS list may include normal host names, a range of hosts in hostlist
format (See HOSTLIST EXPRESSIONS), or a single '-' character to read the list of hosts on
stdin.
If a host or hostlist is preceded by a '-' character, this causes those hosts to be explicitly
excluded. If the argument is preceded by a single '^' character, it is taken to be the path to file
containing a list of hosts, one per line. If the item begins with a '/' character, it is taken as
a regular expression on which to filter the list of hosts (a regex argument may also be optionally
trailed by another '/', e.g. /node.*/). A regex or file name argument may also be preceeded by a
minus '-' to exclude instead of include thoses hosts.
A list of hosts may also be preceded by "user@" to specify a remote username other than the default,
or "rcmd_type:" to specify an alternate rcmd connection type for these hosts. When used together,
the rcmd type must be specified first, e.g. "ssh:user1@host0" would use ssh to connect to host0 as
user "user1."
-xhost,host,...
Exclude the specified hosts. May be specified in conjunction with other target node list
options such as -a and -g (when available). Hostlists may also be specified to
the -x option (see the HOSTLIST EXPRESSIONS section below). Arguments to -x
may also be preceeded by the filename ('^') and regex ('/') characters as described above, in which
case the resulting hosts are excluded as if they had been given to -w and preceeded with the
minus '-' character.
-S Return the largest of the remote command return values.
-h Output usage menu and quit. A list of available rcmd modules will also be printed at the end of
the usage message.
-s Only on AIX, separate remote command stderr and stdout into two sockets.
-q List option values and the target nodelist and exit without action.
-b Disable ctrl-C status feature so that a single ctrl-C kills parallel job. (Batch Mode)
-l user
This option may be used to run remote commands as another user, subject to authorization. For
BSD rcmd, this means the invoking user and system must be listed in the user's .rhosts file (even
for root).
-t seconds
Set the connect timeout. Default is 10 seconds.
-u seconds
Set a limit on the amount of time a remote command is allowed to execute. Default is no limit.
See note in LIMITATIONS if using -u with ssh.
-f number
Set the maximum number of simultaneous remote commands to number. The default is 32.
-R name
Set rcmd module to name. This option may also be set via the PDSH_RCMD_TYPE environment
variable. A list of available rcmd modules may be obtained via the -h, -V, or -L
options. The default will be listed with -h or -V.
-M name,...
When multiple misc modules provide the same options to pdsh, the first module initialized
"wins" and subsequent modules are not loaded. The -M option allows a list of modules to be
specified that will be force-initialized before all others, in-effect ensuring that they load without
conflict (unless they conflict with eachother). This option may also be set via the PDSH_MISC_MODULES
environment variable.
-L List info on all loaded pdsh modules and quit.
-N Disable hostname: prefix on lines of output.
-d Include more complete thread status when SIGINT is received, and display connect and command time
statistics on stderr when done.
-V Output pdsh version information, along with list of currently loaded modules, and exit.
qsh/mqsh module options
-n tasks_per_node
Set the number of tasks spawned per node. Default is 1.
-m block | cyclic
Set block versus cyclic allocation of processes to nodes. Default is block.
-r railmask
Set the rail bitmask for a job on a multirail system. The default railmask is 1, which corresponds
to rail 0 only. Each bit set in the argument to -r corresponds to a rail on the system, so
a value of 2 would correspond to rail 1 only, and 3 would indicate to use both rail 1 and rail 0.
In addition to the genders options presented below, the genders attribute pdsh_rcmd_type may also
be used in the genders database to specify an alternate rcmd connect type than the pdsh default for
hosts with this attribute. For example, the following line in the genders file
host0 pdsh_rcmd_type=ssh
would cause pdsh to use ssh to connect to host0, even if rsh were the default. This can be overridden
on the commandline with the "rcmd_type:host0" syntax.
-A Target all nodes in genders database. The -A option will target every host listed in genders
-- if you want to omit some hosts by default, see the -a option below.
-a Target all nodes in genders database except those with the "pdsh_all_skip" attribute. This is
shorthand for running "pdsh -A -X pdsh_all_skip ..."
-g attr[=val][,attr[=val],...]
Target nodes that match any of the specified genders attributes (with optional values). Conflicts
with -a and -w options. This option targets the alternate hostnames in the genders
database by default. The -i option provided by the genders module may be used to translate
these to the canonical genders hostnames. If the installed version of genders supports it, attributes
supplied to -g may also take the form of genders queries. Genders queries will
query the genders database for the union, intersection, difference, or complement of genders attributes
and values. The set operation union is represented by two pipe symbols ('||'), intersection by two
ampersand symbols ('&&'), difference by two minus symbols ('--'), and complement by a tilde ('~').
Parentheses may be used to change the order of operations. See the nodeattr(1) manpage for examples
of genders queries.
-X attr[=val][,attr[=val],...]
Exclude nodes that match any of the specified genders attributes (optionally with values). This
option may be used in combination with any other of the node selection options (e.g. -w,
-g, -a, -X may also take the form of genders queries. Please see documentation
for the genders -g option for more information about genders queries.
-i
Request translation between canonical and alternate hostnames.
-F filename
Read genders information from filename instead of the system default genders file. If
filename doesn't specify an absolute path then it is taken to be relative to the directory
specified by the PDSH_GENDERS_DIR environment variable (/etc by default). An alternate genders
file may also be specified via the PDSH_GENDERS_FILE environment variable.
The nodeattr module supports access to the genders database via the nodeattr(1) command. See the
genders section above for a list of support options with this module. The option usage with the
nodeattr module is the same as genders, above, with the exception that the -i option
may only be used with -a or -g. NOTE: This module will only work with very old
releases of genders where the nodeattr(1)
command supports the -r option, and before the libgenders API was available. Users running newer
versions of genders will need to use the genders module instead.
The slurmmodule allowspdsh to target nodes based on currently running SLURM
jobs. The slurm module is typically called after all other node selection options have been processed,
and if no nodes have been selected, the module will attempt to read a running jobid from the SLURM_JOBID
environment variable (which is set when running under a SLURM allocation). If SLURM_JOBID references
an invalid job, it will be silently ignored.
-j jobid[,jobid,...]
Target list of nodes allocated to the SLURM job jobid. This option may be used multiple
times to target multiple SLURM jobs. The special argument "all" can be used to target all nodes running
SLURM jobs, e.g. -j all.
The torquemodule allowspdsh to target nodes based on currently running Torque/PBS
jobs. Similar to the slurm module, the torque module is typically called after all other node
selection options have been processed, and if no nodes have been selected, the module will attempt to
read a running jobid from the PBS_JOBID environment variable (which is set when running under a Torque
allocation).
-j jobid[,jobid,...]
Target list of nodes allocated to the Torque job jobid. This option may be used multiple
times to target multiple Torque jobs.
rms module options
The rms module allows pdsh to target nodes based on an RMS resource. The rms module
is typically called after all other node selection options, and if no nodes have been selected, the
module will examine the RMS_RESOURCEID environment variable and attempt to set the target list of hosts
to the nodes in the RMS resource. If an invalid resource is denoted, the variable is silently ignored.
SDR module options
The SDR module supports targeting hosts via the System Data Repository on IBM SPs.
-a Target all nodes in the SDR. The list is generated from the "reliable hostname" in the SDR by
default.
-i Translate hostnames between reliable and initial in the SDR, when applicable. If the a target
hostname matches either the initial or reliable hostname in the SDR, the alternate name will be substitued.
Thus a list composed of initial hostnames will instead be replaced with a list of reliable hostnames.
For example, when used with -a above, all initial hostnames in the SDR are targeted.
-v Do not target nodes that are marked as not responding in the SDR on the targeted interface. (If
a hostname does not appear in the SDR, then that name will remain in the target hostlist.)
-G In combination with -a, include all partitions.
dshgroup module options
The dshgroup module allows pdsh to use dsh (or Dancer's shell) style group files from /etc/dsh/group/
or ~/.dsh/group/.
-g groupname,...
Target nodes in dsh group file "groupname" found in either ~/.dsh/group/groupname or /etc/dsh/group/groupname.
-X groupname,...
Exclude nodes in dsh group file "groupname."
netgroup module options
The netgroup module allows pdsh to use standard netgroup entries to build lists of target hosts.
(/etc/netgroup or NIS)
Equivalent to the -R option, the value of this environment variable will be used to set
the default rcmd module for pdsh to use (e.g. ssh, rsh).
PDSH_SSH_ARGS
Override the standard arguments that pdsh passes to the ssh(1) command ("-2 -a -x -l%u
%h"). The use of the parameters %u, %h, and %n (as documented in the rcmd/exec
section above) is optional. If these parameters are missing, pdsh will append them to the
ssh commandline because it is assumed they are mandatory.
PDSH_SSH_ARGS_APPEND
Append additional options to the ssh(1) command invoked by pdsh.
For example, PDSH_SSH_ARGS_APPEND="-q" would run ssh in quiet mode, or "-v" would increase the verbosity
of ssh. (Note: these arguments are actually prepended to the ssh commandline to ensure they appear
before any target hostname argument to ssh.)
WCOLL
If no other node selection option is used, the WCOLL environment variable may be set to a filename
from which a list of target hosts will be read. The file should contain a list of hosts, one per
line (though each line may contain a hostlist expression. See HOSTLIST EXPRESSIONS section
below).
DSHPATH
If set, the path in DSHPATH will be used as the PATH for the remote processes.
FANOUT
Set the pdsh fanout (See description of -f above).
Rcmd Modules
As described earlier, pdsh uses modules to implement and extend its core functionality. There are two
basic kinds of modules used in
pdsh -- "rcmd" modules, which implement the remote connection method pdsh uses to run commands, and "misc"
modules, which implement various other pdsh functionality, such as node list generation and filtering.
The current list of loaded modules is printed with the pdsh -V output
pdsh -V
pdsh-2.23 (+debug)
rcmd modules: ssh,rsh,mrsh,exec (default: mrsh)
misc modules: slurm,dshgroup,nodeupdown (*conflicting: genders)
[* To force-load a conflicting module, use the -M <name> option]
Note that some modules may be listed as conflicting with others. This is because these modules may provide additional
command line options to pdsh, and if the command line options conflict, the options to pdsh, and if
the command line options conflict, the
Detailed information about available modules may be viewed via the -L option:
> pdsh -L
8 modules loaded:
Module: misc/dshgroup
Author: Mark Grondona <[email protected]>
Descr: Read list of targets from dsh-style "group" files
Active: yes
Options:
-g groupname target hosts in dsh group "groupname"
-X groupname exclude hosts in dsh group "groupname"
Module: rcmd/exec
Author: Mark Grondona <[email protected]>
Descr: arbitrary command rcmd connect method
Active: yes
Module: misc/genders
Author: Jim Garlick <[email protected]>
Descr: target nodes using libgenders and genders attributes
Active: no
Options:
-g query,... target nodes using genders query
-X query,... exclude nodes using genders query
-F file use alternate genders file `file'
-i request alternate or canonical hostnames if applicable
-a target all nodes except those with "pdsh_all_skip" attribute
-A target all nodes listed in genders database
...
This output shows the module name author, a description, any options provided by the module and whether the module is currently
"active" or not.
The -M option may be use to force-load a list of modules before all others, ensuring that they will be active if there is
a module conflict. In this way, for example, the genders module could be made active and the dshgroup
module deactivated for one run of pdsh. This option may also be set via the PDSH_MISC_MODULES
environment variable.
The method by which pdsh runs commands on remote hosts may be selected at runtime using the
-R option (See OPTIONS below). This functionality is ultimately implemented via dynamically
loadable modules, and so the list of available options may be different from installation to installation.
A list of currently available rcmd modules is printed when using any of the -h, -V, or
-L options. The default rcmd module will also be displayed with the -h and -V
options.
A list of rcmd modules currently distributed with pdsh follows.
rsh
Uses an internal, thread-safe implementation of BSD rcmd(3) to run commands using
the standard rsh(1) protocol.
exec
Executes an arbitrary command for each target host. The first of the pdsh remote arguments
is the local command to execute, followed by any further arguments. Some simple parameters are substitued
on the command line, including %h for the target hostname, %u for the remote username,
and %n for the remote rank [0-n] (To get a literal % use %%). For example, the
following would duplicate using the ssh module to run hostname(1) across the hosts
foo[0-10]:
and this command line would run
grep(1) in parallel across the files console.foo[0-10]:
pdsh -R exec -w foo[0-10] grep BUG console.%h
ssh
Uses a variant of popen(3)
to run multiple copies of the ssh(1)
command.
mrsh
This module uses the mrsh(1) protocol to execute jobs on remote hosts. The mrsh protocol
uses a credential based authentication, forgoing the need to allocate reserved ports. In other aspects,
it acts just like rsh. Remote nodes must be running mrshd(8) in order for the mrsh module
to work.
qsh
Allows pdsh to execute MPI jobs over QsNet. Qshell propagates the current working directory,
pdsh environment, and Elan capabilities to the remote process. The following environment variable
are also appended to the environment: RMS_RANK, RMS_NODEID, RMS_PROCID, RMS_NNODES, and RMS_NPROCS.
Since pdsh needs to run setuid root for qshell support, qshell does not directly support propagation
of LD_LIBRARY_PATH and LD_PREOPEN. Instead the QSHELL_REMOTE_LD_LIBRARY_PATH and QSHELL_REMOTE_LD_PREOPEN
environment variables will may be used and will be remapped to LD_LIBRARY_PATH and LD_PREOPEN by
the qshell daemon if set.
mqsh
Similar to qshell, but uses the mrsh protocol instead of the rsh protocol.
krb4
The krb4 module allows users to execute remote commands after authenticating with kerberos. Of
course, the remote rshd daemons must be kerberized.
xcpu
The xcpu module uses the xcpu service to execute remote commands.
Limitations
When using ssh for remote execution, expect the stderr of ssh to be folded in with that of
the remote command. When invoked by pdsh, it is not possible for ssh to prompt for passwords
if RSA/DSA keys are configured properly, etc.. For ssh implementations that suppport a connect
timeout option, pdsh attempts to use that option to enforce the timeout (e.g. -oConnectTimeout=T
for OpenSSH), otherwise connect timeouts are not supported when using ssh. Finally, there is
no reliable way for pdsh to ensure that remote commands are actually terminated when using a
command timeout. Thus if -u is used with ssh commands may be left running on remote hosts
even after timeout has killed local ssh processes.
Output from multiple processes per node may be interspersed when using qshell or mqshell rcmd modules.
The number of nodes that pdsh can simultaneously execute remote jobs on is limited by the
maximum number of threads that can be created concurrently, as well as the availability of reserved
ports in the rsh and qshell rcmd modules. On systems that implement Posix threads, the limit is typically
defined by the constant PTHREADS_THREADS_MAX.
The cluster comes with a simple parallel shell named pdsh. The pdsh shell is handy for
running commands across the cluster. There is man page that describes the capabilities of pdsh
is detail. One of the useful features is the capability of specifying all or a subset of the
cluster. For example: pdsh -a targets the to all nodes of the cluster, including the master.
pdsh -a -x node00 targets the to all nodes of the cluster except the master. pdsh node[01-08]
targets the to the 8 nodes of the cluster named node01, node02, . . ., node08.
Another utility that is useful for formatting the output of pdsh is dshbak. Here we will
show some handy uses of pdsh.
Show the current date and time on all nodes of the cluster. pdsh -a date
Show the current load and system uptime for all nodes of the cluster. pdsh -a
uptime
Show the version of the Operating System on all nodes.
pdsh -a cat /etc/redhat-release
Check who is logged in the MetaGeek lab!
pdsh -w node[01-32] who
Show all process that have the substring pbs on the cluster. These will be the PBS
servers running on each node.
pdsh -a ps augx | grep pbs | grep -v grep
The utility dshbak formats the output from pdsh by consolidating the output from
each node. The option -c shows identical output from different nodes just once . Try
the following commands.
pdsh -w node[01-32] who | dshbak
pdsh -w node[01-32] who | dshbak -c
pdsh -a date | dshbak -c
Administrators can build wrapper commands around pdsh for commands that are
frequently used across multiple systems and Serviceguard clusters. Several such wrapper
commands are provided with DSAU. These wrappers are Serviceguard cluster-aware and default to
fanning out cluster-wide when used in a Serviceguard environment. These wrappers support most
standard pdsh command line options and also support long options ( --
option syntax) .
cexec is a general purpose pdsh wrapper. In addition to the standard
pdsh features, cexec includes a reporting feature. Use the
--report_loc option to have cexec display the report location for a command.
The command report records the command issued in addition to the nodes where the command
succeeded, failed, or the nodes that were unreachable. The report can be used with the
--retry option to replay the command against nodes that failed, succeeded, were
unreachable, or all nodes.
ccp
ccp is a wrapper for pdcp and copies files cluster-wide or to the
specified set of systems.
cps
cps fans out a ps command across a set of systems or cluster.
ckill
ckill allows the administrator to signal a process by name since the pid of a
specific process will vary across a set of systems or the members of a cluster.
cuptime
cuptime displays the uptime statistics for a set of systems or a cluster.
cwall
cwall displays a wall(1M) broadcast message on multiple hosts.
All the wrappers support the CFANOUT_HOSTS environment variable when not executing in a
Serviceguard cluster. The environment variable specifies a file containing the list of hosts to
target, one hostname per line. This will be used if no other host specifications are present on
the command line. When no target nodelist command line options are used and CFANOUT_HOSTS is
undefined, the command will be executed on the local host.
For more information on these commands, refer to their reference manpages.
Hm, this seems like a good idea, but I'm not sure dshbak is the right
place for this. (That script is meant to simply reformat output which
is prefixed by "node: ")
If you'd like to track up/down nodes, you should check out Al Chu's
Cerebro and whatsup/libnodeupdown:
http://www.llnl.gov/linux/cerebro/cerebro.html
http://www.llnl.gov/linux/whatsup/
But I do realize that reporting nodes that did not respond to pdsh
would also be a good feature. However, it seems to me that pdsh itself
would have to do this work, because only it knows the list of hosts originally
targeted. (How would dshbak know this?)
As an alternative I sometimes use something like this:
# pdsh -a true 2>&1 | sed 's/^[^:]*: //' | dshbak -c
----------------
emcr[73,138,165,293,313,331,357,386,389,481,493,499,519,522,526,536,548,553,560,564,574,601,604,612,618,636,646,655,665,676,678,693,700-701,703,706,711,713,715,717-718,724,733,737,740,759,767,779,817,840,851,890]
----------------
mcmd: connect failed: No route to host
----------------
emcrj
----------------
mcmd: xpoll: protocol failure in circuit setup
i.e. strip off the leading pdsh@...: and send all errors to stdout. Then
collect errors with dshbak to see which hosts are not reachable.
Maybe we should add an option to pdsh to issue a report of failed hosts
at the end of execution?
mark
>
NOTE: if you don't want to enter passwords for each server, then you need to have an
authorized_key installed on the remote servers. If necessary, you can use the environment
variable PDSH_SSH_ARGS to specify ssh options, including which identity file to
use ( -i ).
The commands will be run in parallel on all servers, and output from them will be
intermingled (with the hostname pre-pended to each output line). You can view the output nicely
formatted and separated by host using pdsh 's dshbak utility:
dshbak logfile.txt | less
Alternatively, you can pipe through dshbak before redirecting to a logfile:
IMO it's better to save the raw log file and use dshbak when required, but
that's just my subjective preference. For remote commands that produce only a single line of
output (e.g. uname or uptime ), dshbak is overly verbose
as the output is nicely concise. e.g. from my home network:
You can define hosts and groups of hosts in a file called /etc/genders and then
specify the host group with pdsh -g instead of pdsh -w . e.g. with an
/etc/genders file like this:
pdsh -g all uname -a will run uname -a on all servers. pdsh
-g web uptime will run uptime only on server1 and server 2. pdsh -g
web,mysql df -h / will run df on servers 1, 2, 5, and 6. and so on.
BTW, one odd thing about pdsh is that it is configured to use rsh
by default instead of ssh . You need to either:
use -R ssh on the pdsh command line (e.g. pdsh -R ssh -w server[0-9]
...
export PDSH_RCMD_TYPE=ssh before running pdsh
run echo ssh > /etc/pdsh/rcmd_default to set ssh as the
permanent default.
There are several other tools that do the same basic job as pdsh . I've tried
several of them and found that they're generally more hassle to set up and use.
pdsh pretty much just works with zero or minimal configuration.
dsh -q displays the values of the dsh variables (DSH_NODE_LIST, DCP_NODE_RCP...) dsh <command> runs comamnd on each server in DSH_NODE_LIST dsh <command> | dshbak same as above, just formats output to separate each
host dsh -w aix1,aix2 <command> execute command on the given servers (dsh -w aix1,aix2
"oslevel -s") dsh -e <script> to run the given script on each server
(for me it was faster to dcp and after run the script with dsh on the remote server)
dcp <file> <location> copies a file to the given location (without
location home dir will be used)
dping -n aix1, aix2 do a ping on the listed servers dping -f <filename> do a ping for all servers given in the file (-f)
use it heavily in the operations team at Acquia, and it has served
me extremely well when I'm in a tight spot and I need to run a command across a large set of servers
quickly. A quick tip about use- I tend to run pdsh with this environment variable setting, especially
since servers can commonly be relaunched in a cloud environment, and I don't want to deal with my
SSH known_hosts file being inaccurate:
Example -
using a HEREDOC (here-document) and sending quotation marks in a command with PDSH
Here documents (heredocs)
are a nice way to embed multi-line content in a single command, enabling the scripting of a file
creation rather than the clumsy instruction to "open an editor and paste the following lines
into it and save the file as /foo/bar".
Fortunately heredocs work just fine with pdsh, so long as you remember to enclose the whole command
in quotation marks. And speaking of which, if you need to include quotation marks in your actual
command, you need to escape them with a backslash. Here's an example of both, setting up the configuration
file for my ever-favourite
gnu screen on all the nodes of the cluster:
Obviously if you have Puppet Enterprise
fully integrated within your environment, you can take advantage of powerful tools such as
mcollective. If you do not, pdsh is a great
alternative.
Now I can shift into second gear and try some fancier pdsh tricks. First, I want to run a more
complicated command on all of the nodes . Notice that I put the entire command in quotes. This means
the entire command is run on each node, including the first (cat /proc/cpuinfo) and second (grep
bogomips , model ,cpu) parts.
[shaha@oc8535558703 PDSH]$ pdsh "cat /proc/cpuinfo | egrep 'bogomips|model|cpu' "
ubuntu@ec2-52-58-254-227: cpu family : 6
ubuntu@ec2-52-58-254-227: model : 63
ubuntu@ec2-52-58-254-227: model name : Intel(R) Xeon(R) CPU E5-2676 v3 @ 2.40GHz
ubuntu@ec2-52-58-254-227: cpu MHz : 2400.070
ubuntu@ec2-52-58-254-227: cpu cores : 1
ubuntu@ec2-52-58-254-227: cpuid level : 13
ubuntu@ec2-52-58-254-227: bogomips : 4800.14
ec2-user@ec2-52-59-121-138: cpu family : 6
ec2-user@ec2-52-59-121-138: model : 62
ec2-user@ec2-52-59-121-138: model name : Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
ec2-user@ec2-52-59-121-138: cpu MHz : 2500.036
ec2-user@ec2-52-59-121-138: cpu cores : 1
ec2-user@ec2-52-59-121-138: cpuid level : 13
ec2-user@ec2-52-59-121-138: bogomips : 5000.07
[shaha@oc8535558703 PDSH]$
Now you can try executing commands on the cluster nodes, all at the same
time. For example, let's run "uptime" on all nodes:
[root@headnode data]# pdsh -w headnode,sm,node1,node2 uptime
failed to install module options for "misc/dshgroup"
headnode: Warning: Permanently added the RSA host key for IP address '192.168.20.100' to the list
of known hosts.
sm: 16:03:00 up 7:14, 1 user, load average: 0.00, 0.00, 0.00
headnode: 16:53:16 up 7:23, 1 user, load average: 0.00, 0.04, 0.01
node2: 15:55:21 up 2:31, 1 user, load average: 0.04, 0.01, 0.00
node1: 15:56:07 up 2:31, 1 user, load average: 0.00, 0.00, 0.00
[root@headnode data]#
You can have a list of all your machines listed in a file (/etc/machines) by default.
[root@headnode ~]# vi /etc/machines
headnode
sm
node1
node2
Note: -a will read hostnames from the default machine file (/etc/machines). As shown below:-
[root@headnode ~]# pdsh -a uptime
failed to install module options for "misc/dshgroup"
sm: 14:09:58 up 5:10, 1 user, load average: 0.01, 0.00, 0.00
headnode: 15:00:11 up 5:11, 1 user, load average: 0.08, 0.02, 0.01
node2: 14:02:17 up 5:10, 1 user, load average: 0.00, 0.00, 0.00
node1: 14:03:05 up 5:10, 1 user, load average: 0.00, 0.00, 0.00
[root@headnode ~]#
The other utility included in pdsh RPM is pdcp. Which will copy a file to multiple machines. Hoever,
for pdcp to work, all nodes (involved in pdcp operation), must have a local copy of pdcp installed
for a successful operation.
So for convenience, we will copy the pdsh-* RPMs to all the compute nodes as well.
Pdsh
is an amazing tool helps you execute command through the nodes connected by pdsh. The
script highly depends on this tool and pdcp which included in the tool kit too. Before
you run the actual script, please setup pdsh correctly.
sudo apt-get install pdsh
When pdsh is installed, some configurations still need to do. First is to change
default Remote Command Service (RCMD) to ssh, since by default pdsh
uses linux rcmd to execute command on a remote client but not ssh.
echo 'ssh' > /etc/pdsh/rcmd_default
This will save you for typing -R ssh in pdsh & pdcp every
time. After change protocol to ssh, next we set up passwd-less connections
between nodes for not only pdsh, but also Yarn.
pdcp from the
pdsh package is one
option. pdsh was written to help with management of HPC clusters - I've used it
for that, and I've also used it for management of multiple non-clustered machines.
pdsh and pdcp use
genders to define
hosts and groups of hosts (a "group" is any arbitrary tag you choose to assign to a host, and
hosts can have as many tags as you want.)
For example, if you had a group called 'webservers' in /etc/genders that included
hostA, hostB, hostC, then pdcp -g webservers myscript.sh /usr/local/bin would
copy myscript.sh into /usr/local/bin/ on all three hosts.
Similarly, pdsh -g all uname -r would run uname -r on every host
tagged with "all" in /etc/genders, with the output from each host prefixed with the host's
name.
pdsh commands and pdcp copies are executed in parallel (with limits and timeouts to prevent
overloading of the originating system).
When the command being run produces multi-line output, it can get quote confusing to read.
Another program in the pdsh package called dshbak can group the output
by hostname for easier reading.
after seeing all your comments, it's possible that pdsh & pdcp may be overkill for your
needs...it's really designed to be a system admin's tool rather than a normal non-root user's
tool.
It may be that writing a simple shell script wrapper around scp may be good enough for you.
e.g. here's an extremely simple, minimalist version of such a wrapper script.
#! /bin/bash
# a better version would use a command line arg (e.g. -h) to get a
# comma-separated list of hostnames, but hard-coding it here illustrates
# the concept well enough.
HOSTS="[email protected][email protected]"
# last argument is the target directory on the remote hosts
target_dir="$BASH_ARGV"
# all but the last arg are the files to copy
files=${@:1:$((${#@} - 1))}
for h in $HOSTS; do
scp $files "$h:$target_dir"
done
[David: This probably won't actually post, because I'm sending it from
another account than the one that's subscribed to the list. I don't want to
fix this address issue right now. Feel free to share the below with the
list.]
In HPC clusters, I use and love pdsh, built with genders support.
http://sourceforge.net/projects/pdsh/http://sourceforge.net/projects/genders/
Some quick 'n' dirty examples follow, leaving out a whole lot of context and
discussion and fine points. I just want to get across some of the joy of
having a tool like this. :)
Suppose you have a 1000-node cluster with hostnames like node0001 through
node1000, and pdsh with genders support. You could create a file
/etc/genders with contents like this:
node[0001-1000],adm[01-03] all
node[0001-0040] compute
node[0001-0040] rack1
...
node[0951-1000] highmem
adm[01-03] admin
and to find which compute nodes are down, do something like this:
pdsh -g compute true
whichever ones are down, will eventually return an ssh error. (This assumes
passwordless ssh.)
To check only rack 1: pdsh -g rack1 true
Or only your high-memory nodes: pdsh -g highmem true
Or only your administrative servers: pdsh -g admin true
Or to check whether all have the same filesystems mounted:
pdsh -g all "mount | sort" | dshbak -c
will return a summary of which nodes returned the same output. It might
look like this:
node[0001-0432],node[0434-1000]
-----------
<some normal output for compute nodes>
admin[01-03]
-----------
<some normal output for admin nodes>
node0433
-----------
<some abnormal output>
and now you know you have a problem with node0433.
pdsh is fast. Running the preceding command might take 5-10 seconds to get
the results on a typical 1000-node cluster, so it is quick and easy to use,
and supports quick work (especially ad hoc investigations of what's going
on). The quickness is because pdsh forks several worker processes, 32 by
default; you can change that higher (for more parallelism) or lower (e.g. 1
to get serial operation) with the -f <number> option.
I also love the "exec" functionality of pdsh. For example:
pdsh -R exec -g all ping -c1 -w1 %h
will ping every node instead of ssh'ing to every node to run a command. The
advantage of doing this with pdsh instead of a loop is that it takes
advantage of the parallelism offered by pdsh. Ping is a trivial example;
you can do lots of other things of course, like run an Expect script across
all your network switches. :)
David
On Mon, Jul 12, 2010 at 10:52 AM, David N. Blank-Edelman <dnb at ccs.neu.edu>wrote:
> Hi-
> It's been a bit quiet here so I thought I'd ask a favorite tool question.
>> I've been looking at the current crop of utilities that allow you to easily
> run a command on N of your machines in parallel.
>> There are sort of two flavors:
> 1) those that let you type the same thing on N machines interactively
> (e.g. http://guichaz.free.fr/gsh/ or
>http://sourceforge.net/projects/clusterssh/) and
>> 2) those that just run the command line as given (e.g.
>http://web.taranis.org/shmux/,>http://www.netfort.gr.jp/~dancer/software/dsh.html<http://www.netfort.gr.jp/%7Edancer/software/dsh.html>,
>http://code.google.com/p/parallel-ssh/,>http://sourceforge.net/projects/pdsh/,>http://sourceforge.net/projects/mussh/).
>> I'm mostly interested in the tools in category #2, but I'd be happy to hear
> about cool ones from #1 was well.
>> What do you use and why? What do you like about it, dislike about it?
>> Thanks!
>> -- dNb
> _______________________________________________
> sage-members mailing list
>sage-members at mailman.sage.org>http://mailman.sage.org/mailman/listinfo/sage-members
Let's say I save this file as machines.txt. I can then run a command in parallel across
all these machines:
$ pdsh -R ssh -w ^machines "<command>"
Here are some things you can do with PDSH that you might find useful
Find all python processes running on these machines. $ pdsh -R ssh -w ^machines "ps aux
| grep -i python"
Kill any processes being run by my user. (Super useful if you forget to log out of a lab machine.)
$ pdsh -R ssh -w ^machines "killall -u `whoami`"
Check a specific log file for errors. $ pdsh -R ssh -w ^machines "grep -i error /path/to/log"
It's a handy UNIX tool to have in your arsenal when working with lots of machines. Clearly, I
am only showing the usage of pdsh in the most basic way. Check out
PDSH on Google Code for a more detailed description
of everything PDSH can do.
Optional: If you want a default machine list for pdsh it needs to be created at /etc/machines. I
wanted a list of all compute nodes so I did the following to generate it:
First of all, ClusterShell was developed to be easily used by people previously using pdsh.
As a consequence, the command line tools like
clush and
clubak supports very similar behaviour
and options.
clush
clush standard command line is the same:
$ pdsh -w foo[1-5] echo "Hello World"
is
$ clush -w foo[1-5] echo "Hello World"
host selection options are supported (-w -x -g -X)
ssh related options are supported (-f -t -u -l)
File copies are supported. Equivalent to pdcp and rpdcp are available
through clush options
And other ones. All simple pdsh command could be adapted simply changing the command
name to clush.
On 11 May 2010, at 19:20, Prentice Bisbal wrote:
> Since so many of you use and recommend pdsh, I have a few questions for
> you:
>> 1. Do you build and RPM from the .spec file, which doesn't support
> genders, or do you configure/compile yourself?
I build it myself. From the top of my head the options I use are --with-ssh --without-rsh. Last time i built it if both were built the default was to prefer rsh over ssh which should probably be changed at some point.
> 2. If not using genders, what is the syntax of the /etc/machines file? I
> assume it's the same as the gender file, but that's just a hunch.
It's just a flat list of hosts, one per line, although I believe it can take host-specs as well. e.g compute[0-1023]
> 3. Are there any advantages/disadvantages to using machines over genders?
Genders is much more flexible, machines is easier to configure.
Two more things of note, "dshbak -c" is worth knowing about, pipe the output of pdsh into this and it'll sort the output by hostname and compress hosts with identical output into a single report.
The other really useful aspect of pdsh is the "-R exec" option, instead of running the command on a remote node it runs the command locally but replaces %h with the hostname. One trivial example is "pdsh -a -R exec grep %h /var/log/messages | dskbak -c" but once you get used to it you can use it for much more advanced commands, earlier on today I ran "pdsh -w [0-25] -R exec tune2fs -O extents /dev/mapper/ost_%h" to re-tune all the devices in a lustre filesystem.
Ashley.
--
Ashley Pittman, Bath, UK.
Padb - A parallel job inspection tool for cluster computing
http://padb.pittman.org.uk
+-------------+
| Description |
+-------------+
Pdsh is a multithreaded remote shell client which executes commands on
multiple remote hosts in parallel. Pdsh can use several different
remote shell services, including standard "rsh", Kerberos IV, and ssh.
See the man page in the doc directory for usage information.
Pdsh uses GNU autoconf for configuration. Dynamically loadable
modules of each shell service (as well as other features) will be
compiled based on configuration. By default, rsh, Kerberos IV,
and SDR (for IBM SPs) will be compiled if they exist on the system.
The README.modules file distributed with pdsh contains a description
of each module available, as well as its requirements and/or
conflicts.
If your system does not support dynamically loadable modules, you
may compile modules in statically using the --enable-static-modules
option.
To configure in additional feature modules:
./configure [options]
--without-rsh
Disable support for BSD rcmd(3) (standard rsh).
--with-ssh
Enable support of ssh(1) remote shell service.
--with-machines=/path/to/machines
Use a flat file list of machine names for -a instead of
genders, nodeattr, or SDRGetObjects.
--with-qshell
Enable support for running parallel jobs on the Quadrics Elan
interconnect via the qshell service option (-R qsh) and qshell daemon.
See README.QsNet for more information.
--with-genders
Enable support of a genders database through the genders(3)
library. For pdsh's -i option to function properly,
the genders
database must have alternate node names listed as the value
of
the "altname" attribute.
--with-dshgroups
Enable support of dsh-style group files in ~/.dsh/group/groupname
or /etc/dsh/group/groupname. Allows use of -g/-X to target
or exclude hosts in dsh group files.
--with-netgroup
Enable use of netgroups (via /etc/netgroup or NIS) to build lists
of target hosts using -g/-X to include/exclude hosts.
--with-nodeattr=/path/to/nodeattr
Enable support of a genders database through the nodeattr(1)
command. This is primarily for older systems that do not
yet
have genders(3) library support. For pdsh's -i option
to
function properly, the genders database must have alternate
node names listed as the value of the "altname" attribute
and
the nodeattr command must have the -r option available.
--with-nodeupdown
Enable support of dynamic elimination of down nodes through
the nodeupdown(3) library.
--with-mrsh
Enable support of mrsh(1) remote shell service.
--with-mqshell
Enable support for runnig parallel jobs on the Quadrics Elan
interconnect via the mqshell service option (-R mqsh) and
mqshell dameon. Mqshell is identical to qshell but
adds munge
authentication (authentication used by mrsh).
--with-rms
Support running pdsh under RMS allocation.
--with-slurm
Support running pdsh under SLURM allocation.
--with-fanout=N
Specify default fanout (default is 32).
--with-timeout=N
Set default connect timeout (default is 10 seconds).
--with-readline
Use the GNU readline library to parse input in interactive mode.
--without-pam
Disable PAM from the qshell and mqshell daemons. By
default, they
are enabled.
Note that a number of the above configurations options may "conflict"
with each other because they perform identical operations. For
example, genders and nodeattr both support the -g option. If several
modules are installed that support identical options, the options will
default to one particular module. Static compilation of modules will
fail if conflicting modules are selected. See the man page in this
directory for details on which modules conflict.
+------------+
| INSTALLING |
+------------+
make
make install
By default, pdsh is now installed without setuid permissions. This
is because, for the majority of the rcmd connect protocols, root
permissions are not necessarily needed. If you are using either of
the "rcmd/rsh" or "rcmd/qsh" modules, you will need to change the
permissions of pdsh and pdcp to be setuid root after the install.
For example:
If you compile the qshell and/or mqshell with PAM support, remember to
update your PAM configuration files to support the "qshell" and/or
"mqshell" service names. There are sample xinetd(8) config files
for qshd and mqshd in the etc/ directory. Also be sure read the
README.QsNet file in this directory.
+---------+
| GOTCHAS |
+---------+
Watch out for the following gotchas:
1) When executing remote commands via rsh, krb4, qsh, or ssh, pdsh
uses one reserved socket for each active connection, two if it is
maintaining a separate connection for stderr. It obtains these
sockets by calling rresvport(), which normally draws from a pool of
256 sockets. You may exhaust these if multiple pdsh's are running
simultanously on a machine, or if the fanout is set too high. Mrsh
and mqsh do not use reserved ports, and therefore are not affected
this problem as severely.
2) When pdsh is using a remote shell service that is wrapped with TCP
wrappers, there are three areas where bottlenecks can be created:
IDENT, DNS, and SYSLOG. If your hosts.allow includes "user@", e.g.
"in.rshd : ALL@ALL : ALLOW" and TCP wrappers is configured to support
IDENT, each simultaneous remote shell connection will result in an
IDENT query back to the source. For large fanouts this can quickly
overwhelm the source. Similarly, if TCP wrappers is configured to
query the DNS on every connection, pdsh may overwhelm the DNS server.
Finally, if every remote shell connection results in a remote syslog
entry, syslogd on your loghost may be overwhelmed and logs may grow
excessively long.
If local security policy permits, consider configuring TCP wrappers to
avoid calling IDENT, DNS, or SYSLOG on every remote shell connection.
Configuring without the "PARANOID" option (which requires all
connections to be registered in the DNS), permitting a simple list of
IP addresses or a subnet (no names, and no user@ prefix), and setting
the SYSLOG severity for the remote shell service to a level that is
not remotely logged will avoid these pitfalls. If these actions are
not possible, you may wish to reduce pdsh's default fanout (configure
--with-fanout=N).
+---------------------+
| THEORY OF OPERATION |
+---------------------+
We will generalize for the common remote shell service rsh. The
following is similar for all other shell services (ssh, krb4, qsh,
etc.), but other shell services may include additional security or
features.
A thread is created for each rsh connection to a node. Each thread
opens a connection using an MT-safe rcmd-like function, returns
stdin and stderr streams, then terminates.
The mainline starts fanout number of rsh threads and waits on a
condition variable that is signalled by the rsh threads as they
terminate. When the condition variable is signalled, the main thread
starts a new rsh thread to maintain the fanout, until all remote
commands have been executed.
A timeout thread is created that monitors the state of the threads and
terminates any that take too much time connecting or, if requested on
the command line, take too long to complete.
Typing ^C causes pdsh to list threads that are in the connected state.
Another ^C immediately following the first one terminates the program.
Please send suggestions, bug reports, or just a note letting me know
that you are using pdsh (it would be interesting to hear how many
nodes are in your cluster).
+------+
| NOTE |
+------+
This product includes software developed by the University of
California, Berkeley and its contributors. Modifications have been
made and bugs are probably mine.
The PDSH software package has no affiliation with the Democratic Party
of Albania (www.pdsh.org).
From: Mark A. Grondona <mgrondona@ll...>
- 2011-11-08 16:51:35
On Mon, 7 Nov 2011 17:24:32 -0800, Michael Lampe <mlampe0@...> wrote:
> Mark A. Grondona wrote:
>
> > You can set /proc/sys/net/ipv4/tcp_tw_recycle and
> > /proc/sys/net/tcp_tw_reuse on the nodes if you want to be
> > sure TIME_WAIT connections aren't interfering here, but
> > I really don't think that is the problem.
>
> Yep, the problem is on the frontend, not on the nodes.
>
> To repeat myself in this thread:
>
> "The idiot has been identified -- it's me."
>
> When I assembled & installed this cluster ~3 years ago, I thought it a
> good idea to have it a firewall -- and why not allow the nodes also
> access to the outer world?
>
> Because I was too lame to learn iptables, I installed firestarter. A
> nice little tool, that does exactly that with just a few clicks. It's
> really a nice little tool, but it also installed a file
> /etc/firestarter/sysctl-tuning, which -- just for your personal
> amusement -- is attached.
>
> Nobody ever complained about this "tuning" -- OK, MPI uses Infiniband,
> but NFS (over TCP/IP) is as ok as it can be. And I'm also eating my dog
> food ...
>
> So again: Thanks a bunch!! (And sorry for the wasted time.)
No problem, and glad to hear things are working now!? ;-)
>
> -Michael
>
>
> ======================================================================
>
>
> # --------( Sysctl Tuning - Recommended Parameters )--------
>
> # Turn off IP forwarding by default
> # (this will be enabled if you require masquerading)
>
> if [ -e /proc/sys/net/ipv4/ip_forward ]; then
> echo 0 > /proc/sys/net/ipv4/ip_forward
> fi
>
> # Do not log 'odd' IP addresses (excludes 0.0.0.0 & 255.255.255.255)
>
> if [ -e /proc/sys/net/ipv4/conf/all/log_martians ]; then
> echo 0 > /proc/sys/net/ipv4/conf/all/log_martians
> fi
>
>
> # --------( Sysctl Tuning - TCP Parameters )--------
>
> # Turn off TCP Timestamping in kernel
> if [ -e /proc/sys/net/ipv4/tcp_timestamps ]; then
> echo 0 > /proc/sys/net/ipv4/tcp_timestamps
> fi
>
> # Set TCP Re-Ordering value in kernel to '5'
> if [ -e /proc/sys/net/ipv4/tcp_reordering ]; then
> echo 5 > /proc/sys/net/ipv4/tcp_reordering
> fi
>
> # Turn off TCP ACK in kernel
> if [ -e /proc/sys/net/ipv4/tcp_sack ]; then
> echo 0 > /proc/sys/net/ipv4/tcp_sack
> fi
>
> #Turn off TCP Window Scaling in kernel
> if [ -e /proc/sys/net/ipv4/tcp_window_scaling ]; then
> echo 0 > /proc/sys/net/ipv4/tcp_window_scaling
> fi
>
> #Set Keepalive timeout to 1800 seconds
> if [ -e /proc/sys/net/ipv4/tcp_keepalive_time ]; then
> echo 1800 > /proc/sys/net/ipv4/tcp_keepalive_time
> fi
>
> #Set FIN timeout to 30 seconds
> if [ -e /proc/sys/net/ipv4/tcp_fin_timeout ]; then
> echo 30 > /proc/sys/net/ipv4/tcp_fin_timeout
> fi
>
> # Set TCP retry count to 3
> if [ -e /proc/sys/net/ipv4/tcp_retries1 ]; then
> echo 3 > /proc/sys/net/ipv4/tcp_retries1
> fi
>
> #Turn off ECN notification in kernel
> if [ -e /proc/sys/net/ipv4/tcp_ecn ]; then
> echo 0 > /proc/sys/net/ipv4/tcp_ecn
> fi
>
>
> # --------( Sysctl Tuning - SYN Parameters )--------
>
> # Turn on SYN cookies protection in kernel
> if [ -e /proc/sys/net/ipv4/tcp_syncookies ]; then
> echo 1 > /proc/sys/net/ipv4/tcp_syncookies
> fi
>
>
> # Set SYN ACK retry attempts to '3'
> if [ -e /proc/sys/net/ipv4/tcp_synack_retries ]; then
> echo 3 > /proc/sys/net/ipv4/tcp_synack_retries
> fi
>
> # Set SYN backlog buffer to '64'
> if [ -e /proc/sys/net/ipv4/tcp_max_syn_backlog ]; then
> echo 64 > /proc/sys/net/ipv4/tcp_max_syn_backlog
> fi
>
> # Set SYN retry attempts to '6'
> if [ -e /proc/sys/net/ipv4/tcp_syn_retries ]; then
> echo 6 > /proc/sys/net/ipv4/tcp_syn_retries
> fi
>
>
> # --------( Sysctl Tuning - Routing / Redirection Parameters )--------
>
> # Turn on source address verification in kernel
> if [ -e /proc/sys/net/ipv4/conf/all/rp_filter ]; then
> for f in /proc/sys/net/ipv4/conf/*/rp_filter
> do
> echo 1 > $f
> done
> fi
>
> # Turn off source routes in kernel
> if [ -e /proc/sys/net/ipv4/conf/all/accept_source_route ]; then
> for f in /proc/sys/net/ipv4/conf/*/accept_source_route
> do
> echo 0 > $f
> done
> fi
>
> # Do not respond to 'redirected' packets
> if [ -e /proc/sys/net/ipv4/secure_redirects ]; then
> echo 0 > /proc/sys/net/ipv4/secure_redirects
> fi
>
> # Do not reply to 'redirected' packets if requested
> if [ -e /proc/sys/net/ipv4/send_redirects ]; then
> echo 0 > /proc/sys/net/ipv4/send_redirects
> fi
>
> # Do not reply to 'proxyarp' packets
> if [ -e /proc/sys/net/ipv4/proxy_arp ]; then
> echo 0 > /proc/sys/net/ipv4/proxy_arp
> fi
>
> # Set FIB model to be RFC1812 Compliant
> # (certain policy based routers may break with this - if you find
> # that you can't access certain hosts on your network - please set
> # this option to '0' - which is the default)
>
> if [ -e /proc/sys/net/ipv4/ip_fib_model ]; then
> echo 2 > /proc/sys/net/ipv4/ip_fib_model
> fi
>
> # --------( Sysctl Tuning - ICMP/IGMP Parameters )--------
>
> # ICMP Dead Error Messages protection
> if [ -e /proc/sys/net/ipv4/icmp_ignore_bogus_error_responses ]; then
> echo 1 > /proc/sys/net/ipv4/icmp_ignore_bogus_error_responses
> fi
>
> # ICMP Broadcasting protection
> if [ -e /proc/sys/net/ipv4/icmp_echo_ignore_broadcasts ]; then
> echo 1 > /proc/sys/net/ipv4/icmp_echo_ignore_broadcasts
> fi
>
> # IGMP Membership 'overflow' protection
> # (if you are planning on running your box as a router - you should either
> # set this option to a number greater than 5, or disable this protection
> # altogether by commenting out this option)
>
> if [ -e /proc/sys/net/ipv4/igmp_max_memberships ]; then
> echo 1 > /proc/sys/net/ipv4/igmp_max_memberships
> fi
>
>
> # --------( Sysctl Tuning - Miscellanous Parameters )--------
>
> # Set TTL to '64' hops
> # (If you are running a masqueraded network, or use policy-based
> # routing - you may want to increase this value depending on the load
> # on your link.)
>
> if [ -e /proc/sys/net/ipv4/conf/all/ip_default_ttl ]; then
> for f in /proc/sys/net/ipv4/conf/*/ip_default_ttl
> do
> echo 64 > $f
> done
> fi
>
> # Always defragment incoming packets
> # (Some cable modems [ Optus @home ] will suffer intermittent connection
> # droputs with this setting. If you experience problems, set this to '0')
>
> if [ -e /proc/sys/net/ipv4/ip_always_defrag ]; then
> echo 1 > /proc/sys/net/ipv4/ip_always_defrag
> fi
>
> # Keep packet fragments in memory for 8 seconds
> # (Note - this option has no affect if you turn packet defragmentation
> # (above) off!)
>
> if [ -e /proc/sys/net/ipv4/ipfrag_time ]; then
> echo 8 > /proc/sys/net/ipv4/ipfrag_time
> fi
>
> # Do not reply to Address Mask Notification Warnings
> # (If you are using your machine as a DMZ router or a PPP dialin server
> # that relies on proxy_arp requests to provide addresses to it's clients
> # you may wish to disable this option by setting the value to '1'
>
> if [ -e /proc/sys/net/ipv4/ip_addrmask_agent ]; then
> echo 0 > /proc/sys/net/ipv4/ip_addrmask_agent
> fi
>
> if [ "$EXT_PPP" = "on" ]; then
> # Turn on dynamic TCP/IP address hacking
> # (Some broken PPPoE clients require this option to be enabled)
> if [ -e /proc/sys/net/ipv4/ip_dynaddr ]; then
> echo 1 > /proc/sys/net/ipv4/ip_dynaddr
> fi
> else
> if [ -e /proc/sys/net/ipv4/ip_dynaddr ]; then
> echo 0 > /proc/sys/net/ipv4/ip_dynaddr
> fi
> fi
> fi
> # --------( Sysctl Tuning - IPTables Specific Parameters )--------
>
> # Doubling current limit for ip_conntrack
> if [ -e /proc/sys/net/ipv4/ip_conntrack_max ]; then
> echo 16384 > /proc/sys/net/ipv4/ip_conntrack_max
> fi
Follow the steps below to install pdsh in your IBM Platform PCM/HPC 3.2 cluster.
If you encounter a problem with the steps below, you can open a service request with IBM Support.
For pdsh usage issue, please refer to pdsh man page or online documentation.
To install and setup pdsh, follow these steps:
0. Prerequisites
The only pre-requisite is that your cluster management node must have access to internet, to access
EPEL software repository.
Setup EPEL yum repository on your RHEL 6.2 Installer node
1.1 Grab a URL for epel-release package
You should confirm that your PCM installer is running RHEL 6.2 by checking the redhat-release
file.
# cat /etc/redhat-release
Red Hat Enterprise Linux Server release 6.2 (Santiago)
Modify the /etc/yum.repos.d/epel.repo file and make sure to set enabled=1 for "epel"
repository. Do not enable "epel-debuginfo' and "epel-source" repositories.
1.4 Confirm that EPEL repository is available via yum
# yum repolist
Install PDSH
2.1Use yum to install pdsh package
# yum -y install pdsh
# yum install pdsh-rcmd-rsh.x86_64
2.2 Confirm that pdsh is installed
# which pdsh
Configure PDSH
3.1 Create machines file for pdsh
# mkdir /etc/pdsh
# touch /etc/pdsh/machines
# genconfig hostspdsh > /etc/pdsh/machines
3.2 Configure user environment for PDSH
Open /etc/bashrc file and add following lines at the end
# setup pdsh for cluster users
export PDSH_RCMD_TYPE='ssh'
export WCOLL='/etc/pdsh/machines'
Use PDSH
Now, pdsh is setup for cluster users – similar to previous version of IBM Platform
HPC. To use pdsh, simply run 'pdsh' command
# setup pdsh for cluster users
export PDSH_RCMD_TYPE='ssh'
export WCOLL='/etc/pdsh/machines'
5. Put the host name of the Compute Nodes
# vim /etc/pdsh/machines/
node1
node2
node3
.......
.......
6. Make sure the nodes have their SSH-Key Exchange. For more information, see
Auto SSH Login
without Password 7. Do Install Step 1 to Step 3 on ALL the client nodes. B. USING PDSH Run the command ( pdsh [options]... command )
1. To target all the nodes found at /etc/pdsh/machinefile. Assuming the files are transferred already.
Do note that the parallel copy comes with the pdsh utilities
# pdsh -a "rpm -Uvh /root/htop-1.0.2-1.el6.rf.x86_64.rpm"
2. To target specific nodes, you may want to consider using the -x command
"... A very common way of using pdsh is to set the environment variable WCOLL to point to the file that contains the list of hosts you want to use in the pdsh command. For example, I created a subdirectory PDSH where I create a file hosts that lists the hosts I want to use ..."
The -w option means I am specifying the node(s) that will run the command. In this case, I specified the IP address
of the node (192.168.1.250). After the list of nodes, I add the command I want to run, which is uname -r in this case.
Notice that pdsh starts the output line by identifying the node name.
If you need to mix rcmd modules in a single command, you can specify which module to use in the command line,
by putting the rcmd module before the node name. In this case, I used ssh and typical ssh syntax.
A very common way of using pdsh is to set the environment variable WCOLL to point to the file that contains the list
of hosts you want to use in the pdsh command. For example, I created a subdirectory PDSH where I create a file
hosts that lists the hosts I want to use:
[laytonjb@home4 ~]$ mkdir PDSH
[laytonjb@home4 ~]$ cd PDSH
[laytonjb@home4 PDSH]$ vi hosts
[laytonjb@home4 PDSH]$ more hosts
192.168.1.4
192.168.1.250
I'm only using two nodes: 192.168.1.4 and 192.168.1.250. The first is my test system (like a cluster head node), and the second
is my test compute node. You can put hosts in the file as you would on the command line separated by commas. Be sure not to put a
blank line at the end of the file because pdsh will try to connect to it. You can put the environment variable WCOLL
in your .bashrc file:
export WCOLL=/home/laytonjb/PDSH/hosts
As before, you can source your .bashrc file, or you can log out and log back in. Specifying Hosts
I won't list all the several other ways to specify a list of nodes, because the pdsh website
[9] discusses virtually
all of them; however, some of the methods are pretty handy. The simplest way is to specify the nodes on the command line is to use
the -w option:
In this case, I specified the node names separated by commas. You can also use a range of hosts as follows:
pdsh -w host[1-11]
pdsh -w host[1-4,8-11]
In the first case, pdsh expands the host range to host1, host2, host3, , host11. In the second case, it expands the hosts similarly
(host1, host2, host3, host4, host8, host9, host10, host11). You can go to the pdsh website for more information on hostlist expressions
[10] .
Another option is to have pdsh read the hosts from a file other than the one to which WCOLL points. The command shown in
Listing 2 tells
pdsh to take the hostnames from the file /tmp/hosts , which is listed after -w ^ (with no space between
the "^" and the filename). You can also use several host files,
Listing 2 Read Hosts from File
$ more /tmp/hosts
192.168.1.4
$ more /tmp/hosts2
192.168.1.250
$ pdsh -w ^/tmp/hosts,^/tmp/hosts2 uname -r
192.168.1.4: 2.6.32-431.17.1.el6.x86_64
192.168.1.250: 2.6.32-431.11.2.el6.x86_64
The option -w -192.168.1.250 excluded node 192.168.1.250 from the list and only output the information for 192.168.1.4.
You can also exclude nodes using a node file:
or a list of hostnames to be excluded from the command to run also works.
More Useful pdsh Commands
Now I can shift into second gear and try some fancier pdsh tricks. First, I want to run a more complicated command on all of the
nodes ( Listing 3
). Notice that I put the entire command in quotes. This means the entire command is run on each node, including the first (
cat /proc/cpuinfo ) and second ( grep bogomips ) parts.
Listing 3 Quotation Marks 1
In the output, the node precedes the command results, so you can tell what output is associated with which node. Notice that the
BogoMips values are different on the two nodes, which is perfectly understandable because the systems are different. The first node
has eight cores (four cores and four Hyper-Thread cores), and the second node has four cores.
You can use this command across a homogeneous cluster to make sure all the nodes are reporting back the same BogoMips value. If
the cluster is truly homogeneous, this value should be the same. If it's not, then I would take the offending node out of production
and check it.
A slightly different command shown in
Listing 4 runs
the first part contained in quotes, cat /proc/cpuinfo , on each node and the second part of the command, grep
bogomips , on the node on which you issue the pdsh command.
Listing 4 Quotation Marks 2
The point here is that you need to be careful on the command line. In this example, the differences are trivial, but other commands
could have differences that might be difficult to notice.
One very important thing to note is that pdsh does not guarantee a return of output in any particular order. If you have a list
of 20 nodes, the output does not necessarily start with node 1 and increase incrementally to node 20. For example, in
Listing 5 , I run
vmstat on each node and get three lines of output from each node.
In this series of blog posts I'm taking a look at a few very useful tools that can make your
life as the sysadmin of a cluster of Linux machines easier. This may be a Hadoop cluster, or
just a plain simple set of 'normal' machines on which you want to run the same commands and
monitoring.
Previously we looked at using SSH keys for
intra-machine authorisation , which is a pre-requisite what we'll look at here -- executing
the same command across multiple machines using PDSH. In the next post of the series we'll see
how we can monitor OS metrics across a cluster with colmux.
PDSH is a very smart little tool that enables you to issue the same command on multiple
hosts at once, and see the output. You need to have set up ssh key authentication from the
client to host on all of them, so if you followed the steps in the first section of this
article you'll be good to go.
The syntax for using it is nice and simple:
-w specifies the addresses. You can use numerical ranges [1-4]
and/or comma-separated lists of hosts. If you want to connect as a user other than the
current user on the calling machine, you can specify it here (or as a separate
-l argument)
After that is the command to run.
For example run against a small cluster of four machines that I have:
robin@RNMMBP $ pdsh -w root@rnmcluster02-node0[1-4] date
rnmcluster02-node01: Fri Nov 28 17:26:17 GMT 2014
rnmcluster02-node02: Fri Nov 28 17:26:18 GMT 2014
rnmcluster02-node03: Fri Nov 28 17:26:18 GMT 2014
rnmcluster02-node04: Fri Nov 28 17:26:18 GMT 2014
... ... ...
Example - install and start collectl on all nodes
I started looking into pdsh when it came to setting up a cluster of machines from scratch.
One of the must-have tools I like to have on any machine that I work with is the excellent
collectl .
This is an OS resource monitoring tool that I initially learnt of through Kevin Closson and Greg Rahn , and provides the kind of information you'd get
from top etc – and then some! It can run interactively, log to disk, run as a service
– and it also happens to integrate
very nicely with graphite , making it a no-brainer choice for any server.
So, instead of logging into each box individually I could instead run this:
pdsh -w root@rnmcluster02-node0[1-4] yum install -y collectl
pdsh -w root@rnmcluster02-node0[1-4] service collectl start
pdsh -w root@rnmcluster02-node0[1-4] chkconfig collectl on
Yes, I know there are tools out there like puppet and chef that are designed for doing this
kind of templated build of multiple servers, but the point I want to illustrate here is that
pdsh enables you to do ad-hoc changes to a set of servers at once. Sure, once I have my cluster
built and want to create an image/template for future builds, then it would be daft if
I were building the whole lot through pdsh-distributed yum commands.
Example - setting up
the date/timezone/NTPD
Often the accuracy of the clock on each server in a cluster is crucial, and we can easily do
this with pdsh:
robin@RNMMBP ~ $ pdsh -w root@rnmcluster02-node0[1-4] ntpdate pool.ntp.org
rnmcluster02-node03: 30 Nov 20:46:22 ntpdate[27610]: step time server 176.58.109.199 offset -2.928585 sec
rnmcluster02-node02: 30 Nov 20:46:22 ntpdate[28527]: step time server 176.58.109.199 offset -2.946021 sec
rnmcluster02-node04: 30 Nov 20:46:22 ntpdate[27615]: step time server 129.250.35.250 offset -2.915713 sec
rnmcluster02-node01: 30 Nov 20:46:25 ntpdate[29316]: 178.79.160.57 rate limit response from server.
rnmcluster02-node01: 30 Nov 20:46:22 ntpdate[29316]: step time server 176.58.109.199 offset -2.925016 sec
Set NTPD to start automatically at boot:
robin@RNMMBP ~ $ pdsh -w root@rnmcluster02-node0[1-4] chkconfig ntpd on
Start NTPD:
robin@RNMMBP ~ $ pdsh -w root@rnmcluster02-node0[1-4] service ntpd start
Example - using a HEREDOC (here-document) and sending quotation marks in a command with
PDSH
Here documents
(heredocs) are a nice way to embed multi-line content in a single command, enabling the
scripting of a file creation rather than the clumsy instruction to " open an editor and
paste the following lines into it and save the file as /foo/bar ".
Fortunately heredocs work just fine with pdsh, so long as you remember to enclose the whole
command in quotation marks. And speaking of which, if you need to include quotation marks in
your actual command, you need to escape them with a backslash. Here's an example of both,
setting up the configuration file for my ever-favourite gnu screen on all the nodes of the
cluster:
Now when I login to each individual node and run screen, I get a nice toolbar at the
bottom:
Combining
commands
To combine commands together that you send to each host you can use the standard bash
operator semicolon ;
robin@RNMMBP ~ $ pdsh -w root@rnmcluster02-node0[1-4] "date;sleep 5;date"
rnmcluster02-node01: Sun Nov 30 20:57:06 GMT 2014
rnmcluster02-node03: Sun Nov 30 20:57:06 GMT 2014
rnmcluster02-node04: Sun Nov 30 20:57:06 GMT 2014
rnmcluster02-node02: Sun Nov 30 20:57:06 GMT 2014
rnmcluster02-node01: Sun Nov 30 20:57:11 GMT 2014
rnmcluster02-node03: Sun Nov 30 20:57:11 GMT 2014
rnmcluster02-node04: Sun Nov 30 20:57:11 GMT 2014
rnmcluster02-node02: Sun Nov 30 20:57:11 GMT 2014
Note the use of the quotation marks to enclose the entire command string. Without them the
bash interpretor will take the ; as the delineator of the local commands,
and try to run the subsequent commands locally:
robin@RNMMBP ~ $ pdsh -w root@rnmcluster02-node0[1-4] date;sleep 5;date
rnmcluster02-node03: Sun Nov 30 20:57:53 GMT 2014
rnmcluster02-node04: Sun Nov 30 20:57:53 GMT 2014
rnmcluster02-node02: Sun Nov 30 20:57:53 GMT 2014
rnmcluster02-node01: Sun Nov 30 20:57:53 GMT 2014
Sun 30 Nov 2014 20:58:00 GMT
You can also use && and || to run subsequent commands
conditionally if the previous one succeeds or fails respectively:
robin@RNMMBP $ pdsh -w root@rnmcluster02-node[01-4] "chkconfig collectl on && service collectl start"
rnmcluster02-node03: Starting collectl: [ OK ]
rnmcluster02-node02: Starting collectl: [ OK ]
rnmcluster02-node04: Starting collectl: [ OK ]
rnmcluster02-node01: Starting collectl: [ OK ]
Piping and file redirects
Similar to combining commands above, you can pipe the output of commands, and you need to
use quotation marks to enclose the whole command string.
The difference is that you'll be shifting the whole of the pipe across the network in order
to process it locally, so if you're just grepping etc this doesn't make any sense. For use of
utilities held locally and not on the remote server though, this might make sense.
File redirects work the same way – within quotation marks and the redirect will be to
a file on the remote server, outside of them it'll be local:
robin@RNMMBP ~ $ pdsh -w root@rnmcluster02-node[01-4] "chkconfig>/tmp/pdsh.out"
robin@RNMMBP ~ $ ls -l /tmp/pdsh.out
ls: /tmp/pdsh.out: No such file or directory
robin@RNMMBP ~ $ pdsh -w root@rnmcluster02-node[01-4] chkconfig>/tmp/pdsh.out
robin@RNMMBP ~ $ ls -l /tmp/pdsh.out
-rw-r--r-- 1 robin wheel 7608 30 Nov 19:23 /tmp/pdsh.out
Cancelling PDSH operations
As you can see from above, the precise syntax of pdsh calls can be hugely important. If you
run a command and it appears 'stuck', or if you have that heartstopping realisation that the
shutdown -h now you meant to run locally you ran across the cluster, you can press
Ctrl-C once to see the status of your commands:
robin@RNMMBP ~ $ pdsh -w root@rnmcluster02-node[01-4] sleep 30
^Cpdsh@RNMMBP: interrupt (one more within 1 sec to abort)
pdsh@RNMMBP: (^Z within 1 sec to cancel pending threads)
pdsh@RNMMBP: rnmcluster02-node01: command in progress
pdsh@RNMMBP: rnmcluster02-node02: command in progress
pdsh@RNMMBP: rnmcluster02-node03: command in progress
pdsh@RNMMBP: rnmcluster02-node04: command in progress
and press it twice (or within a second of the first) to cancel:
robin@RNMMBP ~ $ pdsh -w root@rnmcluster02-node[01-4] sleep 30
^Cpdsh@RNMMBP: interrupt (one more within 1 sec to abort)
pdsh@RNMMBP: (^Z within 1 sec to cancel pending threads)
pdsh@RNMMBP: rnmcluster02-node01: command in progress
pdsh@RNMMBP: rnmcluster02-node02: command in progress
pdsh@RNMMBP: rnmcluster02-node03: command in progress
pdsh@RNMMBP: rnmcluster02-node04: command in progress
^Csending SIGTERM to ssh rnmcluster02-node01
sending signal 15 to rnmcluster02-node01 [ssh] pid 26534
sending SIGTERM to ssh rnmcluster02-node02
sending signal 15 to rnmcluster02-node02 [ssh] pid 26535
sending SIGTERM to ssh rnmcluster02-node03
sending signal 15 to rnmcluster02-node03 [ssh] pid 26533
sending SIGTERM to ssh rnmcluster02-node04
sending signal 15 to rnmcluster02-node04 [ssh] pid 26532
pdsh@RNMMBP: interrupt, aborting.
If you've got threads yet to run on the remote hosts, but want to keep running whatever has
already started, you can use Ctrl-C, Ctrl-Z:
robin@RNMMBP ~ $ pdsh -f 2 -w root@rnmcluster02-node[01-4] "sleep 5;date"
^Cpdsh@RNMMBP: interrupt (one more within 1 sec to abort)
pdsh@RNMMBP: (^Z within 1 sec to cancel pending threads)
pdsh@RNMMBP: rnmcluster02-node01: command in progress
pdsh@RNMMBP: rnmcluster02-node02: command in progress
^Zpdsh@RNMMBP: Canceled 2 pending threads.
rnmcluster02-node01: Mon Dec 1 21:46:35 GMT 2014
rnmcluster02-node02: Mon Dec 1 21:46:35 GMT 2014
NB the above example illustrates the use of the -f argument to limit how many
threads are run against remote hosts at once. We can see the command is left running on the
first two nodes and returns the date, whilst the Ctrl-C - Ctrl-Z stops it from being executed
on the remaining nodes.
PDSH_SSH_ARGS_APPEND
By default, when you ssh to new host for the first time you'll be prompted to validate the
remote host's SSH key fingerprint.
The authenticity of host 'rnmcluster02-node02 (172.28.128.9)' can't be established.
RSA key fingerprint is 00:c0:75:a8:bc:30:cb:8e:b3:8e:e4:29:42:6a:27:1c.
Are you sure you want to continue connecting (yes/no)?
This is one of those prompts that the majority of us just hit enter at and ignore; if that
includes you then you will want to make sure that your PDSH call doesn't fall in a heap because
you're connecting to a bunch of new servers all at once. PDSH is not an interactive tool, so if
it requires input from the hosts it's connecting to it'll just fail. To avoid this SSH prompt,
you can set up the environment variable PDSH SSH ARGS_APPEND as follows:
The -q makes failures less verbose, and the -o passes in a couple
of options, StrictHostKeyChecking to disable the above check, and
UserKnownHostsFile to stop SSH keeping a list of host IP/hostnames and
corresponding SSH fingerprints (by pointing it at /dev/null ). You'll want this if
you're working with VMs that are sharing a pool of IPs and get re-used, otherwise you get this
scary failure:
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@ WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED! @
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
IT IS POSSIBLE THAT SOMEONE IS DOING SOMETHING NASTY!
Someone could be eavesdropping on you right now (man-in-the-middle attack)!
It is also possible that a host key has just been changed.
The fingerprint for the RSA key sent by the remote host is
00:c0:75:a8:bc:30:cb:8e:b3:8e:e4:29:42:6a:27:1c.
Please contact your system administrator.
For both of these above options, make sure you're aware of the security implications that
you're opening yourself up to. For a sandbox environment I just ignore them; for anything where
security is of importance make sure you are aware of quite which server you are connecting to
by SSH, and protecting yourself from MitM attacks.
When working with multiple Linux machines I would first and foremost make sure SSH
keys are set up in order to ease management through password-less logins.
After SSH keys, I would recommend pdsh for parallel execution of the same SSH command across
the cluster. It's a big time saver particularly when initially setting up the cluster given the
installation and configuration changes that are inevitably needed.
In the next article of this series we'll see how the tool colmux is a powerful way to
monitor OS metrics across a cluster.
So now your turn – what particular tools or tips do you have for working with a
cluster of Linux machines? Leave your answers in the comments below, or tweet them to me at
@rmoff .
The cluster comes with a simple parallel shell named pdsh. The pdsh shell is
handy for running commands across the cluster. See the man page, which describes the capabilities
of pdsh in detail. One of the useful features is the capability of specifying all or a subset
of the cluster.
For example:
pdsh -a <command> targets the <command> to all nodes of the cluster, including
the master.
pdsh -a -x node00 <command> targets the <command> to all nodes of the cluster
except the master.
pdsh -w node[01-08] <command> targets the <command> to the 8 nodes of the cluster
named node01, node02, ..., node08.
Another utility that is useful for formatting the output of pdsh is dshbak.
Here we will show some handy uses of pdsh.
Show the current date and time on all nodes of the cluster. pdsh -a date
Show the current load and system uptime for all nodes of the cluster. pdsh -a uptime
Show all processes with the substring mpd in their name on the cluster. pdsh -a ps augx | grep mpd
Cleanup MPI files and sockets from all nodes in the system. This can be handy in removing
leftover files from a earlier program or system crash.
pdsh -a mpdcleanup
Remove all instances of pvm temporary files from the cluster. This can be handy in removing
leftover files from a earlier program or system crash. pdsh -a /bin/rm -f /tmp/pvm*
The utility dshbak formats the output from pdsh by consolidating the output
from each node. The option -c shows identical output from different nodes just once.
pdsh -a ls -l /etc/ntp | dshbak -c
Here is a sample output:
[amit@onyx amit]$ pdsh -a ls -l /etc/ntp | dshbak -c
----------------
ws[01-16]
----------------
total 16
-rw-r--r-- 1 root root 8 Jun 4 11:53 drift
-rw------- 1 root root 266 Jun 4 11:53 keys
-rw-r--r-- 1 root root 13 Jun 4 11:53 ntpservers
-rw-r--r-- 1 root root 13 Jun 4 11:53 step-tickers
----------------
ws00
----------------
total 16
-rw-r--r-- 1 ntp ntp 8 Sep 5 21:51 drift
-rw------- 1 ntp ntp 266 Feb 13 2003 keys
-rw-r--r-- 1 root root 58 Oct 3 2003 ntpservers
-rw-r--r-- 1 ntp ntp 23 Oct 3 2003 step-tickers
----------------
ws[17-32]
----------------
total 16
-rw-r--r-- 1 root root 8 May 27 13:31 drift
-rw------- 1 root root 266 May 27 13:31 keys
-rw-r--r-- 1 root root 13 May 27 13:31 ntpservers
-rw-r--r-- 1 root root 13 May 27 13:31 step-tickers
[amit@onyx amit]$
Ever have a multitude of hosts you need to run a command (or series of commands) on? We all
know that forloop outputs are super fun to parse through when you need to do this, but why not do
it better with a tool like pdsh.
A respected ex-colleague of mine made a great suggestion to start
using pdsh instead of forloops and other creative make shift parallel shell processing. The
majority of my notes in this blog post are from him. If he'll allow me too, I'll give him a
shout out and cite his Google+ profile for anyone interested.
Pdsh is a parallel remote shell client available from sourceforge. If you are using
rpmforge CentOS repos
you can pick it up there as well, but it may not be the most bleeding edge package available.
Pdsh
lives on sourceforge, but the code is on google:
Obviously if you have Puppet Enterprise
fully integrated within your environment, you can take advantage of powerful tools such as
mcollective. If you do not, pdsh is a great
alternative.
Building and installing pdsh is really simple if you've built code using GNU's autoconfigure before.
The steps are quite easy:
./configure --with-ssh --without-rsh
make
make install
This puts the binaries into /usr/local/, which is fine for testing purposes. For
production work, I would put it in /opt or something like that – just be sure it's in
your path.
You might notice that I used the --without-rsh option in the configure
command. By default, pdsh uses rsh, which is not really secure, so I chose to exclude
it from the configuration. In the output in
Listing 1, you can see the pdsh rcmd modules (rcmd is the remote command used by
pdsh). Notice that the "available rcmd modules" at the end of the output lists only ssh and exec.
If I didn't exclude rsh, it would be listed here, too, and it would be the default. To override rsh
and make ssh the default, you just add the following line to your .bashrc file:
Listing 1
rcmd Modules
[laytonjb@home4 ~]$ pdsh -v
pdsh: invalid option -- 'v'
Usage: pdsh [-options] command ...
-S return largest of remote command return values
-h output usage menu and quit
-V output version information and quit
-q list the option settings and quit
-b disable ^C status feature (batch mode)
-d enable extra debug information from ^C status
-l user execute remote commands as user
-t seconds set connect timeout (default is 10 sec)
-u seconds set command timeout (no default)
-f n use fanout of n nodes
-w host,host,... set target node list on command line
-x host,host,... set node exclusion list on command line
-R name set rcmd module to name
-M name,... select one or more misc modules to initialize first
-N disable hostname: labels on output lines
-L list info on all loaded modules and exit
available rcmd modules: ssh,exec (default: ssh)
export PDSH_RCMD_TYPE=ssh
Be sure to "source" your .bashrc file (i.e., source .bashrc) to set
the environment variable. You can also log out and log back in. If, for some reason, you see the
following when you try running pdsh,
then you have built it with rsh. You can either rebuild pdsh without rsh, or you can use the environment
variable in your .bashrc file, or you can do both.
First pdsh Commands
To begin, I'll try to get the kernel version of a node by using its IP address:
The -w option means I am specifying the node(s) that will run the command. In this
case, I specified the IP address of the node (192.168.1.250). After the list of nodes, I add the
command I want to run, which is uname -r in this case. Notice that pdsh starts the output
line by identifying the node name.
If you need to mix rcmd modules in a single command, you can specify which module to use in the
command line,
by putting the rcmd module before the node name. In this case, I used ssh and typical ssh syntax.
A very common way of using pdsh is to set the environment variable WCOLL to point
to the file that contains the list of hosts you want to use in the pdsh command. For example, I created
a subdirectory PDSH where I create a file hosts that lists the hosts I
want to use:
[laytonjb@home4 ~]$ mkdir PDSH
[laytonjb@home4 ~]$ cd PDSH
[laytonjb@home4 PDSH]$ vi hosts
[laytonjb@home4 PDSH]$ more hosts
192.168.1.4
192.168.1.250
I'm only using two nodes: 192.168.1.4 and 192.168.1.250. The first is my test system (like a cluster
head node), and the second is my test compute node. You can put hosts in the file as you would on
the command line separated by commas. Be sure not to put a blank line at the end of the file because
pdsh will try to connect to it. You can put the environment variable WCOLL in your
.bashrc file:
export WCOLL=/home/laytonjb/PDSH/hosts
As before, you can source your .bashrc file, or you can log out and log back in.
Specifying Hosts
I won't list all the several other ways to specify a list of nodes, because the pdsh website
[9] discusses virtually all of them; however, some of the methods are pretty handy. The simplest
way is to specify the nodes on the command line is to use the -w option:
In this case, I specified the node names separated by commas. You can also use a range of hosts
as follows:
pdsh -w host[1-11]
pdsh -w host[1-4,8-11]
In the first case, pdsh expands the host range to host1, host2, host3, …, host11. In the second
case, it expands the hosts similarly (host1, host2, host3, host4, host8, host9, host10, host11).
You can go to the pdsh website for more information on hostlist expressions
[10].
Another option is to have pdsh read the hosts from a file other than the one to which WCOLL points.
The command shown in
Listing 2 tells pdsh to take the hostnames from the file /tmp/hosts, which is listed
after -w ^ (with no space between the "^" and the filename). You can also use several
host files,
The option -w -192.168.1.250 excluded node 192.168.1.250 from the list and only output
the information for 192.168.1.4. You can also exclude nodes using a node file:
or a list of hostnames to be excluded from the command to run also works.
More Useful pdsh Commands
Now I can shift into second gear and try some fancier pdsh tricks. First, I want to run a more
complicated command on all of the nodes (Listing
3). Notice that I put the entire command in quotes. This means the entire command is run on each
node, including the first (cat /proc/cpuinfo) and second (grep bogomips)
parts.
In the output, the node precedes the command results, so you can tell what output is associated
with which node. Notice that the BogoMips values are different on the two nodes, which is perfectly
understandable because the systems are different. The first node has eight cores (four cores and
four Hyper-Thread cores), and the second node has four cores.
You can use this command across a homogeneous cluster to make sure all the nodes are reporting
back the same BogoMips value. If the cluster is truly homogeneous, this value should be the same.
If it's not, then I would take the offending node out of production and check it.
A slightly different command shown in
Listing 4 runs the first part contained in quotes, cat /proc/cpuinfo, on each node
and the second part of the command, grep bogomips, on the node on which you issue the
pdsh command.
The point here is that you need to be careful on the command line. In this example, the differences
are trivial, but other commands could have differences that might be difficult to notice.
One very important thing to note is that pdsh does not guarantee a return of output in any particular
order. If you have a list of 20 nodes, the output does not necessarily start with node 1 and increase
incrementally to node 20. For example, in
Listing 5, I run vmstat on each node and get three lines of output from each node.
Listing 5
Order of Output
laytonjb@home4 ~]$ pdsh vmstat 1 2
192.168.1.4: procs ------------memory------------ ---swap-- -----io---- --system-- -----cpu-----
192.168.1.4: r b swpd free buff cache si so bi bo in cs us sy id wa st
192.168.1.4: 1 0 0 30198704 286340 751652 0 0 2 3 48 66 1 0 98 0 0
192.168.1.250: procs -----------memory------------ ---swap-- -----io---- --system-- ------cpu------
192.168.1.250: r b swpd free buff cache si so bi bo in cs us sy id wa st
192.168.1.250: 0 0 0 7248836 25632 79268 0 0 14 2 22 21 0 0 99 0 0
192.168.1.4: 1 0 0 30198100 286340 751668 0 0 0 0 412 735 1 0 99 0 0
192.168.1.250: 0 0 0 7249076 25632 79284 0 0 0 0 90 39 0 0 100 0 0
At first, it looks like the results from the first node are output first, but then the second
node creeps in with its results. You need to expect that the output from a command that returns more
than one line per node could be mixed. My best advice is to grab the output, put it into an editor,
and rearrange the lines, remembering that the lines for any specific node are in the correct order.
... ... ...
The Author
Jeff Layton has been in the HPC business for almost 25 years (starting when he was 4 years old).
He can be found lounging around at a nearby Frys enjoying the coffee and waiting for sales.
The Last but not LeastTechnology is dominated by
two types of people: those who understand what they do not manage and those who manage what they do not understand ~Archibald Putt.
Ph.D
FAIR USE NOTICEThis site contains
copyrighted material the use of which has not always been specifically
authorized by the copyright owner. We are making such material available
to advance understanding of computer science, IT technology, economic, scientific, and social
issues. We believe this constitutes a 'fair use' of any such
copyrighted material as provided by section 107 of the US Copyright Law according to which
such material can be distributed without profit exclusively for research and educational purposes.
This is a Spartan WHYFF (We Help You For Free)
site written by people for whom English is not a native language. Grammar and spelling errors should
be expected. The site contain some broken links as it develops like a living tree...
You can use PayPal to to buy a cup of coffee for authors
of this site
Disclaimer:
The statements, views and opinions presented on this web page are those of the author (or
referenced source) and are
not endorsed by, nor do they necessarily reflect, the opinions of the Softpanorama society.We do not warrant the correctness
of the information provided or its fitness for any purpose. The site uses AdSense so you need to be aware of Google privacy policy. You you do not want to be
tracked by Google please disable Javascript for this site. This site is perfectly usable without
Javascript.