May the source be with you, but remember the KISS principle ;-)
Home Switchboard Unix Administration Red Hat TCP/IP Networks Neoliberalism Toxic Managers
(slightly skeptical) Educational society promoting "Back to basics" movement against IT overcomplexity and  bastardization of classic Unix

Cluster management tools


High Performance Computing (HPC)


Recommended Links

HPC cluster architecture

Bright Cluster Manager Tools
Parallel command execution C3 Tools PDSH -- a parallel remote shell rdist rsync Expect Saferm -- wrapper for rm command
uptime command mostat top ps sar ptree tree command
Unix System Monitoring Ganglia Cluster Monitoring Made Easy Nagios in Large Enterprise Environment Unix Configuration Management Tools Baseliners Simple Unix Backup Tools  
vmstat iostat nfsstat   Admin Horror Stories Humor Etc

Environment and tool for you cluster is as important as OS and hardware it runs on. You toolset can help to answer several fundament questions such as:

Here are some tips:

Top updates

Bulletin Latest Past week Past month
Google Search


Old News ;-)

synctool Freecode

synctool is a cluster administration tool that keeps configuration files synchronized across all nodes in a cluster. Nodes may be part of a logical group or class, in which case they need a particular subset of configuration files. synctool can restart daemons when needed, if their relevant configuration files have been changed. synctool can also be used to do patch management or other system administrative tasks.

Tags Systems Administration Command Line
Licenses GPLv2
Operating Systems Unix
Implementation Python


Fsync is a Perl script which allows for file synchronization between remote hosts, containing functionality similar to that of the rsync and CVS packages. Since fsync is a single Perl script, setting up file synchronization on a new machine is relatively simple. Communication between the hosts is via a socket mechanism, with the remote server started by rsh, by ssh or manually. The program was written with slow modem connections in mind. Fsync supports the concept of merging differences from local/remote hosts with hooks for tools to merge the trees. Fsync requires perl 5.004 or newer. This program is licensed under the GNU Public License.


ghosts (global hosts) provides a simple macro language and grouping system for large numbers of machines. Included is "gsh", a parallel-tasking shell-runner, which allows you to run automated tasks across the entire enterprise using ssh.


ISPMan is a distributed system used to manage components of an ISP from a central management interface. Its written entirely in Perl, using an LDAP backend to manage DNS, Apache virtual hosts, Postfix, Cyrus, FTP, etc. It provides a central Web-based user interface for admins/helpdesk and a commandline interface to automate tasks or hook to other systems.

Scientific Linux (Score:3)

by scheme (19778) writes: on Thursday May 26, 2011 @03:30PM (#36255628)
A lot of this depends on what you're doing with your cluster and what apps you're running. However, Scientific Linux is used by quite a few large clusters and all of the US ATLAS and CMS clusters run on. As others have mentioned, you probably want to be more interested in how the cluster is managed and nodes setup and kept up to date.

I'd recommend something like cobbler and puppet or some other change management system so that you can setup profiles and automatically have that propagated to the various nodes automatically.

This is preferable and easier than going through and making the same configuration changes on 5-10 machines.

Building Clusters (Score:5, Informative)

by Nite_Hawk (1304) writes: on Thursday May 26, 2011 @03:15PM (#36255412) Homepage


I work at a Supercomputing Institute. You can run many different OSes and be successful with any of them.

We run SLES on most of our systems, but CentOS and Redhat are fine, and I'm using Ubuntu successfully for an Openstack cloud.

Rocks is popular though ties you to certain ways of doing things which may or may not be your cup of tea. Certainly it offers you a lot of common cluster software prepackaged which may be what you are looking for.

More important than the OS are the things that surround it. What does your network look like? How you are going to install nodes, and how you are going to manage software?

  • Personally, I'm a fan of using dhcp3 and tftpboot along with kickstart to network boot the nodes and launch installs, then network boot with a pass-through to the local disk when they run.
  • Once the initial install is done I use Puppet to take over the rest of the configuration management for the node based on a pre-configured template for whatever job that node will serve (for clusters it's pretty easy since you are mostly dealing with compute nodes).
  • It becomes extremely easy to replace nodes by just registering their mac address and booting them into an install. This is just one way of doing it though. You could use cobbler to tie everything together, or use FAI. XCAT is popular on big systems, or you could use system imager, or replace puppet with chef or cfengine...
  • Next you have to decide how you want to schedule jobs. You could use Torque and Maui, or Sun Grid Engine, or SLURM...

Re:Building Clusters (Score:3)

by clutch110 (528473) writes: on Thursday May 26, 2011 @03:34PM (#36255698)

This post is full of good information. I have been managing HPC for seismic companies for the past 8 years now. I regularly use xCAT as I find that after a few nodes automation is the way to go.

You will find that most clusters run RedHat or a variant of the OS. Most places run CentOS on the nodes and have a machine with RedHat stashed around somewhere in case a problem occurs and they need to reproduce it on a "supported" OS.

Why is there a requirement for a full blown X install? Are these machines desktop boxes or are they racked? Typically you have a thin client software installed at the cluster gateway. We use both NX and ThinAnywhere today.

Re:Building Clusters (Score:1)

by GPSguy (62002) writes: on Friday May 27, 2011 @08:44AM (#36262156) Homepage

There is also the occasional need for something like VNC when you absolutely, positively have to have that remote desktop look for your visualization software.

Re:Building Clusters (Score:2)

by Junta (36770) writes: on Thursday May 26, 2011 @06:53PM (#36258020)

Incidentally, I'm an xCAT developer and am always interested in ways to make it scale down a bit better as well as it scales up. Historically, it has been worth it at large scale, but a bit too much configuration for small systems. A lot of settings support autodetect now and if you don't care much about the nitty-gritty, applying the default templates provides a serviceable set of groups that can drive configuration instead of micromanaging all sorts of details.

In terms of DHCP, it generally allows and uses nice features of ISC DHCP without requiring more complex config than dnsmasq, which is probably one of the nicer things it does. Notably if you are doing iSCSI, deploying Windows, want a nfs or ramroot system, or other fancy stuff xCAT does a good job of taking care of the requisite details (I'm a bit biased).

I'm always interested in other ways to make it easier or criticisms on how it's deficient. Preferably on the xcat-user mailing list.

What Will You Run, and Who Will Run It? (Score:1)

by javanree (962432) writes: on Thursday May 26, 2011 @09:41PM (#36259152)

Bright Cluster Manager is quite nice, but still lacking loads of things. The shell is powerful but very cryptic, the graphical interface doesn't allow certain operations to be done on a range of selected nodes for instance...

Also the 'integration' with for instance Sun Grid Engine (which is supported) is not very thorough (specially with regard to setting up queues, something the Sun tools already suck at) I still need Bright Cluster Manager + the SGE tools + Sun ARCo to get almost everything done, and at times it feels like a lot of duplicated effort and some good old *NIX handywork is still required to really get things moving.

However the development team is very receptive to constructive feedback and much has changed over the past 6 months.

Recommended Links

Top Visited

Bulletin Latest Past week Past month
Google Search

Cluster management by James E Prewett.