Softpanorama

Home Switchboard Unix Administration Red Hat TCP/IP Networks Neoliberalism Toxic Managers
May the source be with you, but remember the KISS principle ;-)
Skepticism and critical thinking is not panacea, but can help to understand the world better

Heterogeneous Unix server farm administration

News Slightly Skeptical View on Enterprise Unix Administration Recommended Links Unix Configuration Management Tools Puppet Red Hat Satellite  
File distiribution: copying a file to multiple hosts Slurping: copying a file from multiple hosts Software Distribution   Baseliners   profile and RC-files
Simple backup Expect ssh Shell Screen mc Performance
  Event Correlation  System monitoring Nagios Job schedulers Bash as an Enterprise-level Shell Teraterm
Logs Collection and Analysis Perl admintools Classic Unix Tools Typical problems with IT infrastructure History Humor Etc

“it is better to solve the right problem the wrong way than the wrong problem the right way”.

Server farm is a set of Unix servers with the same OS but different hardware a hardware configurations. Server farm consists of groups of almost identical hardware-wise computers that can be managed simultaneously as they have mostly identical configuration files.  Such a group should be viewed as a class and individual servers as instances of this class.

Over the course of several years of deploying, reworking, and administering such farms (mainly Linux and Solaris), we developed a certain methodology and toolset. We began thinking of an entire farm similar to one high performance cluster, rather than as a collection of individual hosts. This change of perspective, and the decisions it invoked, made a world of difference in cost and ease of administration. The standard functionality includes:

There is relatively little prior art in print which addresses the problems of server farms. There really is no "standard" way to assemble or manage such server farms.

Because infrastructures are usually ad hoc, setting up a new infrastructure or attempting to harness an existing unruly infrastructure can be bewildering for new sysadmins. At the same time large part of knowledge that exist for HPC cluster apples and the sequence of steps needed to develop such "class-based" infrastructure is relatively straightforward;  the discovery of that sequence can be time-consuming and fraught with error. Moreover, mistakes made in the early stages of setup or migration can be difficult to remove for the lifetime of the infrastructure.

We will discuss the sequence that we developed and offer a brief glimpse into a few of the many tools and techniques this perspective generated. Typical problems that need to at least partially solved include:

  1. The documentation is incomplete

    Complete network, host and application documentation is must for every site. If an admin is constantly called while on vacation or while others are on call is probably not documenting their systems properly. Everything needed to understand and fix problems at their site should be clearly documented in a central location.
     

  2. The systems aren't discoverable

    The systems should be as self-documenting as possible. This means that start/stop/reload scripts should be in conventional locations (like /etc/init.d/ on SysV and on most Linuxes), MOTD messages should give helpful info, scripts should have comments explaining their usage and purpose, and automated alerts should send useful information about the error condition(s) found. Leave nothing to be rediscovered every time someone new has to work on the systems, let them spend their time working on the actual issue(s) at hand.

    An admin unfamiliar with the machine(s) in question should be able to find their way around the system with a minimum of trouble.

Many admins become mired in their site's problems, and stop trying to improve their situation. They accept that their disks keep filling up, that their applications keep dying, and that mundane tasks take up all their time.

If they were to write cron jobs to trim files that grow until filesystems fill, restart dead applications with init or cfengine or cron scripts or daemontools, and automate repetitive tasks from cron or cfengine, they would have a smooth running network. Once things run smoothly, they can spend their time updating software, improving security, or any of the many projects that improve overall conditions. Such projects get little effort spent on them at sites without a proactive attitude.


Recommended Links

Google matched content

Softpanorama Recommended

Top articles

Sites

Top articles

Sites



Etc

The Last but not Least Technology is dominated by two types of people: those who understand what they do not manage and those who manage what they do not understand ~Archibald Putt. Ph.D


Copyright © 1996-2018 by Dr. Nikolai Bezroukov. www.softpanorama.org was initially created as a service to the (now defunct) UN Sustainable Development Networking Programme (SDNP) in the author free time and without any remuneration. This document is an industrial compilation designed and created exclusively for educational use and is distributed under the Softpanorama Content License. Original materials copyright belong to respective owners. Quotes are made for educational purposes only in compliance with the fair use doctrine.

FAIR USE NOTICE This site contains copyrighted material the use of which has not always been specifically authorized by the copyright owner. We are making such material available to advance understanding of computer science, IT technology, economic, scientific, and social issues. We believe this constitutes a 'fair use' of any such copyrighted material as provided by section 107 of the US Copyright Law according to which such material can be distributed without profit exclusively for research and educational purposes.

This is a Spartan WHYFF (We Help You For Free) site written by people for whom English is not a native language. Grammar and spelling errors should be expected. The site contain some broken links as it develops like a living tree...

You can use PayPal to make a contribution, supporting development of this site and speed up access. In case softpanorama.org is down you can use the at softpanorama.info

Disclaimer:

The statements, views and opinions presented on this web page are those of the author (or referenced source) and are not endorsed by, nor do they necessarily reflect, the opinions of the author present and former employers, SDNP or any other organization the author may be associated with. We do not warrant the correctness of the information provided or its fitness for any purpose.

The site uses AdSense so you need to be aware of Google privacy policy. You you do not want to be tracked by Google please disable Javascript for this site. This site is perfectly usable without Javascript.

Last modified: March 12, 2019