Home Switchboard Unix Administration Red Hat TCP/IP Networks Neoliberalism Toxic Managers
May the source be with you, but remember the KISS principle ;-)
Bigger doesn't imply better. Bigger often is a sign of obesity, of lost control, of overcomplexity, of cancerous cells

Unix System Monitoring

Version 2.01 (March 2017)

News Slightly Skeptical View on Enterprise Unix Administration Recommended Links Event correlation Perl Pipes Probe Architecture
Mon Ganglia Spong Nagios Big Sister SAR SSH-based monitoring
Syslog monitoring Enterprise Logs Collection and Analysis Infrastructure Logwatch Syslog Anomaly Detection Analyzers Dell DRAC HP iLO IPMI
HP Operations Manager Unix Configuration Management Tools Baseliners Simple Unix Backup Tools Oracle Enterprise Manager  Tivoli TEC
Filesystem free space monitoring Website monitoring Web log analysis Sample simple monitoring scripts Performance monitoring Humor Etc

Any sufficiently large and complex monitoring package written in C contains buggy
and ad hoc implementation of at least 66% of Perl interpreter.

Reinterpretation of P. Greenspun quote about Lisp


System monitoring, specifically Unix system monitoring,  is an old idea and there was not much progress for the last thirty years in comparison with state of the art in early 1990th when Tivoli and OpenView entered the marketplace.  Moreover now it is clear that any too complex monitoring system is actually counterproductive: sysadmins simply don't use it or use small fraction of its functionality because they are unable or unwilling to master the required level of complexity. they have already too much complexity on the plate to want more.

Another important factor (or nail into expensive proprietary monitoring system, such as IBM Tivoli, or HOP OpenView,  coffin) is the tremendous success of protocols such  as ssh and tool like rsync that change the equation making separate/proprietary channels of communication  between the monitoring clients and the "mother ship" less necessary.

Another important development is proliferation and relative success of Unix/Linux configuration management system which also have some monitoring component (or can be programmed to perform those tasks along with configuration tasks).

Even HP OpenView which is somewhat better that several other commercial systems looks like a huge overkill for a typical sysadmin. Too much staff to learn, too little return on investment.  And if OpenView is managed by a separate department this is simply a disaster: typically those guys are completely detached from the needs of the rank-and-file sysadmins and live in their imaginary (compartmentalized) world. Moreover they are prone of creating red tape and as a result stupid, unnecessary probes are installed and stupid tickets are generated. 

In one organization those guys decided to offload the problem of dying Open Views agents daemons ( which in Open View tent to die regularly and spontaneously) to sysadmins, creating a stream of completely useless tickets.  That was probably the easiest way to condition sysadmins to hate OpenView. As a result, communication lines between OpenView team and sysadmins became frozen, the system fossilized and served no useful purpose at all. Just "waiving dead chicken type of system. Those "monitoring honchos" enjoyed their life for a while until they were outsourced. At the same time useful monitoring of filesystems free space was done by a simple shell script, written by one of sysadmins ;-). So much for investment in Open View and paying for specialized monitoring staff. 

As for Tivoli deployments, sometimes I think that selling their products in a kind of  a subversive work of some foreign power (is not IBM too cozy with China :-) which wants to undermine the USA IT.  They do produced good eBooks called Redbooks, though ;-)

At the core monitoring system is a specialized scheduler that executes local or remote jobs (called probes) at predetermined time (typically each N minutes). In case of remote servers execution can be agentless (in this case ssh or telnet typically are used as an agent, but shared filesystem like NFS also can be used) or using specialized agent (end point).  There was not any really revolutionary ideas in this space for the last 20 years or so. That absence of radical new ideas permits commodization of the field and corresponding downward pressure on prices. With open source product now "good enough". Some firms still try to play the "high price" - "high value" game with the second rate software they own, but I think that time for premium prices for monitoring products has gone. 

Now the baseline for comparison is several open source systems which you can try free and buy professional support later on, which usually has lower maintenance costs them proprietary systems. That does not mean that they can compete in all areas and, for example, agent-based monitoring and event correlation is still done  better by proprietary, closed source systems, but they are usually more adaptable and flexible which is an important advantage.  Here is one apt quote:

Nagios is frankly not very good, but it's better than most of the alternatives in my opinion. After all, you could spend buckets of cash on HP OpenView or Tivoli and still be faced with the same amount of work to customize it into a useful state....

Monitoring layers

Unix system monitoring includes several layers:

  1. Hardware layer. A typical server hardware includes motherboard, processors, memory, I/O devices and fans. Each of them can go south and each can be monitored via ssh connection directly to DRAC or ILO.  Which gives you a degree of independence from the state of the hardware and Os on the server and allow to transmit alerts (often very useful) from those systems. Parameters monitored include harddrives health (a critical thing to monitor), CPU overheating (and downtrotling), and some more exotic, but interesting parameters including electricity consumption (people usually "misunderestimate" how much their "mostly circulating air" server cost to the company in annual electricity bill ;-). All of those can bee  monitored via Dell DRAC,  HP iLO or IPMI. Electricity supply in  datacenters usually is very reliable but there are periods of blackouts when the system needs to run on UPS or backup generator at minimum for a time to shut down applications and OS properly without data loss.  Fans has rotating parts and as such they are more prone to malefaction. Some components like I/O controllers can have battery that permits writing down data to the disk in case of power outage. With age this battery can be unable to perform this function, so it should be proactively replaced.
  2. Operating system layer. Here parameters that are in /proc are very useful, for example uptime.  The most typical task on this layer is monitoring filesystems for available free space. Actually few systems are doing this task better that simple cron driven scripts that use "df -k command" ;-). Generally monitoring such parameters as free memory, uptime, I/O etc helps to ensure high availability of the system.  Operating layer problems might be result of problems on hardware layer or networking layer. Here comes the importance of suppression of "derivative" problems.  Few systems do it right without intriducoing excessive complexity (Tivoli attempt to use Prolog for this purpose failed dismally).
  3. Networking layer. Modern system are interconnected, but this connectivity can come and go. Probably the most useful adn the most widely deployed type of monitoring is ICMP monitoring when you "ping" remote host periodically to detect periods when network connectivity disappears.  But sometimes ping is not enough, you need TCP version of it or even probe of a specific application protocol, like HTTP. This is a large area of monitoring that has its own specialized systems.
  4. Application layer.   On top of OS there are system process and applications. Some are running from RC scripts, some from cron or other scheduler.   As with any complex system a lot of things can go wrong.  Applications can also suffer from problems in underlying layers, especially OS (running our of space or CPU overload by other applications) and networking layers.  That means that suppression of "derivative events" via event correlation mechanisms is more important for this type of monitoring.

Pitfalls in operating systems monitoring

In this page we will mainly discuss operating system monitoring. And we will discuss it from our traditional "slightly skeptical" viewpoint. First of all it is important to understand that if the system is geographically remote it is considerably more difficult to  determine what went wrong, as you lose significant part of the context of the situation that is transparent to the local personnel. Remote cameras can help to provide some local context, but still they are not enough.  It's much like flying airplane at night: you need to rely solely on instruments.  In this case you need more a sophisticated system. Another large and somewhat distinct category are virtual machines.  Which actually can be remote, in distant locations too. 

Most system processes write some messages to syslog, if things went wrong. That means that first thing in OS monitoring should be monitoring of system logs, but this is seldom done and extremely rarely done correctly. the second thing is monitoring of disks free space. Which also seldom is done correctly, as this simple problem does not have a simple solution and have a lot of intricate details that needs to be  taken into account (various filesystem usually need to have different  thresholds, 100% utilization of some filesystem in Ok, while others (such a /tmp is a source of problems, But logs and free space is two areas were the real work on a robust monitoring system should start. Not from acquiring system with a set of some useful or semi-useful probes but getting a sophisticated log analyzer and  writing yourself a customized for your environment free filesystems space analyzer. Reasonably competent free space analyzer that allows individual thresholds for filesystem and two stages of alerted (warning and critical) can be  written in less then 1000 lines of Perl or Python and that means that it can be written and debugged in a week or two. 

Please be aware that some commercial offerings in the category of log analyzers are weak and are close to junk and (in case of commercial offerings) survive only due to relentless marketing (splunk might be one such example). 

Some use databases to process log. Which is not a bad idea but it depends on your level of familiarity with database and SQL (typically this attractive option for those sysadmin, who maintain a lof of MySQL or Oracle databases) and the size of your log files. With extremely large log files you better stay within flat file paradigm, although SSD changed this equation recently.   Spam filters can serve as a prototype for useful log analyzers.  In case of analyzing flat file usage of regex is a must, so Perl looks like a preferable scripting language for this typ of alayser. A reasonably competent analyzer can be written in 2-3K of code. Multiple prototypes can be downloaded from he Web or from the distribution you are using (see, for example, Logwatch ). The key problem here (vividly represented by Logwatch) is that the set of "informative" log messages tend to fluctuate with time and is generally OS version depends (varying even from one release to another, but drastically different for example between RHEL 5 and RHEL 6) and in one year and a couple of upgrades your database of alerts becomes semi-useless. If you have time to do another cycle of modifying the script -- then good, if not, you have another monitoring script that is "waving dead chicken". One way to avoid this situation is to use Syslog Anomaly Detection Analyzers but ther are still pretty raw and can produce many false positives. 

If you manage large number of systems it is important for your sanity to see the situation on existing boxes via dashboard and integrated alert stream. You just physically can't login and check boxes one by one. While monitoring is not a one-size-fits-all solution, a lot of tasks can be standardized and instead of reinventing the bicycle adopted from or with some existing open source system.  Reinventing the bicycle (unless your a real expect in LAMP) is usually pretty expensive exercise. You probably are better off betting on one of the popular open source system such as Nagios and using its framework for writing your own scripts.

The problem of monitoring is complicated by the fact that situation with Unix systems monitoring in most large organizations typically is far from rational.  Which means that sometimes it is close of Kafkaesque level of bureaucratic absurdity ;-) Here we means that it is marked by a senseless, illogical,  disorienting, often menacing complexity and bureaucratic barriers. Most large organizations have crippled by this phenomenon monitoring infrastructures. The following situations are pretty typical:

Few people understand that the key question in sound approach to monitoring is the selection of the level of complexity that is optimal both for the system administrators (who due to the overload is the weakest link in the system) and at the same time produce at least 80% of the results necessary to keep a healthy system. Actually in many cases useful set of probes is much smaller that one would expect. For example, monitoring of disk filesystems for free space typically is No.1 task that in many cases of enterprise deployment probably constitute 80% of total value of monitoring system, monitoring performance of few parameters the server (CPU, uptime, I/O) is probably No.2 that has 80% of residual 20% and so on.  In other word Pareto law is fully applicable to monitoring

Simplicity pays nice dividends:  if tool is written in a scripting language and matches the level of skills of sysadmins they can better understand it and possibly adapt it to the environment and thus get far superior results then any "of the shelf" tool. For example, if local sysadmins just know shell (no Perl, no Javascript), then the ability to write probes in shell is really important and any attempt to deploy tools like ITM 5.1 (with probes written in JavaScript) is just a costly mistake.

Also avoiding spending a lot of money on acquisition, training and support of overly complex tool provide opportunity to pay more for support including separately paid incidents which vendors love and typically serve with very high priority as for unlike annual maintenance contract they represent "unbooked" revenue source. 

Let's think if any set of proprietary tools that companies like IBM try to push thou the throat for, say, half-million dollars in just annual maintenance fees (using cheap tricks like charging per core, etc) are that much better that a set of free open source tools that covers the same set of monitoring and scheduling tasks.  I bet you get pretty good quality 24x7 support for a small fraction of this sum and at the end of the day it all that matter. I saw many cases in which companies used an expensive package and implemented subset of functionality that was just a little more then ICMP (aka ping) monitoring. Or that the subset of used functionality can be replicated much more successfully by a half dozen simple Perl scripts.  The Alice in Wonderland of perversions of corporate system monitoring still need to be written, but it is clear that regular logic is not applicable to a typical corporate environment. Or many if should be not Alice of Wonderland but

Softpanorama Law of Monitoring

Another important consideration is what we can call Softpanorama law of monitoring:  If in a large organization,  the level of complexity of a monitoring tool exceeds certain threshold (which depends on the number and the level of specialization of dedicated to this task sysadmins and the level of their programming skills of all other sysadmins) the monitoring system usually became stagnant and people are reluctant to extend and adapt it to new tasks. Instead of being a part of the solution such a tool becomes a part of the problem.

This is typical situation on the level of complexity typical for Tivoli, CA Unicenter and, to a slightly lesser extent,  HP Operations  Manager (former Open View). For example, writing rules for Tivoli TEC requires some understanding of Prolog (which is very rare, almost non-existent skill, among Unix sysadmins) as well as Perl ( knowledge of which is far more common, but far from universal among sysadmins, especially on Windows). 

Adaptability means that simpler open source monitoring systems that uses just the language sysadmin know well be it Bash or Perl has tremendous advantages over the complex one in the long run. Adaptability of the tool is an important characteristic and it is unwise (but pretty common) to ignore it.

If in a large organization if the level of complexity of a monitoring tool exceeds certain threshold (which depends on the number and the level of specialization of dedicated to this task sysadmins and the level of their programming skills) the monitoring system usually became stagnant and people are reluctant to extend and adapt it to new tasks. Adaptability of the tool is an important characteristic and it is unwise (but pretty common) to ignore it. 

I suspect that the level of complexity should be much lower that the complexity of monitoring solutions used in most large organizations (actually Goldman Sachs extensively uses Nagios, despite being probably the richest organization on the planet ;-). Such cases allow to overcome corporate IT bureaucracy. In any case that fact on the ground is that in many current implementations in large organization complex monitoring system are badly maintained (to the extent they become almost useless as in example with Open View above) and their capabilities are hugely underutilized.  That demonstrate that raising above certain level of complexity of monitoring system is simply counterproductive, and simple, more nimble systems have an edge.   sometime two simple systems (one for OS monitoring, one for network and applications probes) outperform a single complex system by large margin.

In other words most organizations suffer from the feature creep in monitoring systems in the same way they are suffering from feature creep in regular applications.

Major categories of operating system monitoring

Like love system monitoring is a word with multiple meanings. We can define several categories of operating system monitoring:

  1. Monitoring system logs This is  sine qua non of operating system monitoring. A must.  If this is not done (and done properly), there not reason to discuss any other aspects of monitoring because as Talleyrand characterized such situation "this is worse then a crime -- this is a blunder." In Unix this presuppose the existence of centralized server, so called LOGHOST server.   Few people understand that log analyses on LOGHOST server by itself represents a pretty decent distributed monitoring system and that instead reinventing the wheel it is possible to enhance it by writing probes that run from cron and which write messages to syslog as well as monitoring script on the LOGHOST that pickup specific messages (or sets of messages) from the log.

    In a typical Unix implementation such as Solaris or RHEL 6  a wealth of information is collected by syslog daemon and put in  /var/log/messages (linux) or /var/adm/messages (Solaris, HP-US).  There are now "crippled" distributions that uses jounald without syslog daemon, but RHEL in version 7 continues to use rsyslogd. 

    Unix syslog, which originated from Sendmail project records various conditions including crashes of components, failed login attempts, and many other useful things including information about health of key daemons. This is an integral area that overlaps each and every areas described above, but still deserve to be treated as a separate. System logs provide a wealth of information about the health of the system, most of which is usually never used as it is buried in the noise and because regular syslog daemon outlived its usefulness (syslog-ng used as a replacement for syslogd in Suse 10 and 11 provides quite good abilities to filter logs, but unfortunately they are very complex to configure and difficult to debug).

    Sending log stream from all similar systems to the special log server is also important from the security standpoint.

  2. Monitoring System Configuration Changes This category includes monitoring for changes in hardware and software configurations that can be caused by an operating system upgrade, patches applied to the system, changes to kernel parameters, or the installation of a new software application.

    The root cause of system problems can often be traced back to an inappropriate hardware or software configuration change. Therefore, it is important to keep accurate records of these changes, because the problem that a change causes may remain latent for a long period before it surfaces. Adding or removing hardware devices typically requires the system to be restarted, so configuration changes can be tracked indirectly (in other words, remote monitoring tools would notice system status changes).

    However, software configuration changes, or the installation of a new application, are not tracked in this way, so reporting tools are needed. Also, more systems are becoming capable of adding hardware components online, so hardware configuration tracking is becoming increasingly more important.

    Here version control systems and Unix configuration management tools directly compete with monitoring systems. As I mentioned some Unix configuration management systems have agents and as such can replicate lion share of typical Unix monitoring system tasks.

  3. Monitoring System Faults. After ensuring that the configuration is correct, the first thing to monitor is the overall condition of the system. Is the system up? Can you talk to it, ping it, run a command? If not, a fault may have occurred. Detecting system problems varies from determining whether the system is up to determining whether it is behaving properly. If the system either isn't up or is up but not behaving properly, then you must determine which system component or application is having a problem.

  4. Monitoring System Resource Utilization. For an application to run correctly, it may need certain system resources such as the amount of CPU,  memory or I/O bandwidth an application is entitled to use during a time interval. Other examples include the number of open files or sockets, message segments, and system semaphores that an application has. Usually an application (and operating system) has fixed limits for each of these resources, so monitoring their use at levels close to threshold is important. If they are exhausted, the system may no longer function properly. Another aspect of resource utilization is studying the amount of resources that an application has used. You may not want a given workload to use more than a certain amount of CPU time or fixed amount of disk space. Some resource management tools, such as quota, can help with this.

  5. Monitoring System Performance. Monitoring the performance of system resources can help to indicate problems with the operation of the system. Bottlenecks in one area usually impact system performance in another area. CPU, memory, and disk I/O bandwidth are the important resources to watch for performance bottlenecks.  establish baselines you should monitor system during typical usage periods. Understanding what is "normal"  helps to identify when system resources are scares during  a particular periods (for example "rush hours"). Resource management tools are available that can help you to allocate system resources among applications and users.

  6. Monitoring System Security. While the ability to protect your systems and information from determine intruders is a pipe dream due to existence of such organizations as NSA and CIA (and you really should consider the return to typewriters for such materials disallowing any electronic copy) , some level of difficulties for intruders can and should be created. Among other things that includes so called "monitoring for unusual activities" . This type of monitoring includes monitoring of  last log, unusual permissions, unusual changes in /etc/passwd files and other similar "suspicious" activities. This is generally a separate area from "regular monitoring" for which specialized systems exist. A separate task is so called hardening of the system --  ensuring compliance with the policies set for the systems (permissions of key files, configuration of user accounts, set of people who can assume the role of root), etc. This is type of monitoring that is difficult to do right as the notions of superior activity is so fuzzy. Performance and resource controls are also can be useful for detecting such activities.  The value of specialized security tools is often overstated, but in small doses they can be useful not harmful. That first of all is applicable to so called hardening scripts and local firewall configurators. For example it is easy to monitor for world writable files and wrong permissions on home directories and key system directories. There no reason not to implement this set of checks. In many cases static (configuration settings) security monitoring can be adapted from existing hardening package such as (now obsolete) Titan or its more modern derivatives.

    As a side note I would like to mention that rarely used and almost forgotten AppArmor  (that is available in Suse by default) can do wonders with application security.

  7. Monitoring system performance. Here in the simplest form, the output of System Activity Reporter (sar) can be processed and displayed. Sar is a simple and very good tool first developed for Solaris and later adopted by all other flavors of Unix, including Linux. This solution should always be implemented first, before any more complex variants of performance monitoring are even considered.  Intel provides good performance monitoring tools with their compiler suit.   

Overcomplexity as the key problem with monster, "enterprise ready", packages

"The big four" - HP Operations Center (with Operations Manager as the key component), Tivoli. BMC and CA Unicenter  dominate large enterprise space. They are very complex and expensive products, products which require dedicated staff and provide relatively low return on investment. Especially taking into account the TCO which dramatically increases with each new version due to overcomplexity.  In a way dominant vendors painted themselves into a corner by raising the complexity far above the level normal sysadmin can bear. 

My experience with big troika is mainly in "classic" Tivoli (before Candle, aka Tivoli Monitoring 6.1, and Micromuse,  aka Netcool, acquisitions) and HP_Operations Manager, but still I think this statement reflects the reality of  all "big vendors ESM products": also mentioned vendors use overcomplexity as a shield to protect against competitors and to extract a rent from customers.  IBM is especially guilty in "incorrect" behavior as it become very greedy resorting to such dirty tricks as licensing of their software products per socket or, worse, per core.  You reject such offers as a matter of prudency: you can definitely utilize your money ten times more efficiently then buying such a product by using a decent open source product such as Puppet (which while not a monitoring system per se duplicates much of this functionality) with professional support.   Nothing in monitoring space even remotely justifies licensing per socket or per core.  Let Watt Street firms use those wonderful products as only for them one million more one million less is a rounding error.

Also despite level of architectural thinking is iether completely absent, or is very low, new versions of such commercial systems are produced with excessive frequency to keep the ball in play. While  the technologies used can be ridiculously outdated: those  products often use obsolete of semi-obsolete architecture and sometimes obscure, outdated and difficult to understand and debug protocols.  In the latter case, the products became the source of  hidden (or not so hidden) security vulnerabilities. That actually is not limited to monitoring tools and it typical for any large complex enterprise applications (HP Data Protector with its free root telnet for all nodes in an insecure mode comes to mind). In a way, the agents on each server are always should be viewed as hidden backdoors, not that different from backdoors used for "zombification" of servers by hackers.   That does not mean that agentless tools are more secure. If they use protocols such as SSH for running remote probes, the "mothership" server that host such a system became a "key to the kingdom" too. This is a pretty typical situation for such tools as Nagios and HP SiteScope.

For major vendors of monitoring products with substantial installed userbase overcomplexity is to certain extent unavoidable: they need to increase complexity with each version due to the feeling of insecurity and the desire to protect and extend their franchise. What is bad is that overcomplexity is used as the mean of lock-in of users and as a shield that protects against competitors simultaneously helping to extract rent from existing customers (the more complex the tool is the more profitable are various training classes). Certain vendors simply cannot and do not want to compete on the basis of functionality provided. They do need a lock-in to survive and prosper. 

For major vendors of monitoring products with substantial installed userbase overcomplexity is to certain extent unavoidable: they need to increase complexity with each version due to the feeling of insecurity and the desire to protect and extend their franchise. What is bad is that overcomplexity is used as the mean of lock-in of users and as a shield that protects against competitors simultaneously helping to extract rent from existing customers (the more complex the tool is the more profitable are various training classes).

In a way, this is very similar pressures to those that destroyed the US investment banks in recent "subprime mess". Due to such pressures vendors are logically pushed by events into the road which inevitably leads to converting their respective systems into barely manageable monsters. They still can be very scalable despite overcomplexity, but the flexibility of the solutions and the quality of interface suffers greatly.  And only due to high quality and qualification of tech support  those system can be maintained and remain stable in a typical enterprise.

That opens some space for open source monitoring solutions which can be much simpler and  rely much more on established protocols (for example, HTTP, SMTP and SSH). Important fact which favors simpler solutions is that in any organization, usefulness of the monitoring package is limited to the ability of personnel to tweak it to the environment.  Packages with tuning that are above the head of the personnel can actually be harmful (Tivoli Monitoring 5.1 with its complex API and JavaScript-based extensions is a nice example of the genre)

In any organization, usefulness of the monitoring package is limited to the ability of personnel to tweak it to the environment.  Packages with the complexity of tuning that are above the head of the personnel can actually be harmful (Tivoli Monitoring 5.1 with its complex API and JavaScript-based extensions is a nice example of the genre)

Since adequate (and very expensive) training for those products is often skipped as an overhead, it' not surprising that many companies never get more than the most basic functionality for a very expensive (and theoretically  capable) product. And basic functionality is better provided by simple free or low cost packages. So extremes meet. This situation might be called a system monitoring paradox. That's exactly what makes Tivoli, HP Operations Center, BMC and CA Unicenter  consultants happy and in business for many years.

System monitoring paradox is that both expensive and cheap monitoring solution usually provide very similar quality of monitoring and both have adequate capabilities for a typical large company

It costs quite a lot to maintain and customize tools like Tivoli or Open view in large enterprise environment where money for this are readily available. Keeping good monitoring specialist on the job is also a problem as once person become really good in scripting they tend to move to other, more interesting areas,  like web development.  There is nothing too exciting in daily work of monitoring specialist and after a couple of years the usual feeling is that his/her IQ is underutilized is to be expected. So most capable people typically move on.  The strong point of big troika is support and availability of professional services but the costs are very high.  But it is important to understand that complex products to a certain extent reflect the large datacenter environment complexity and not all tasks can be performed by simple products although 80% might be s a reasonable estimate. 

That means that the $3.6 billion market for enterprise system management software is ripe for competition from products that utilize scripting languages instead of trying to foresee each and every need the enterprise can have. Providing simple scripting framework for writing probes and implementing the event log, dashboard and configuration viewer on a  webserver lower the barrier of entry.

But such solutions are not in the interests of large vendors as they can lower their profits.  They cannot do not want to compete in this space. What is interesting is that scripting-based monitoring solutions are pretty powerful and proved to be competitive with much more complex "pre-compiled" or Java-based offerings. There are multiple scripting-based offerings from startups and even individual developers which can deliver 80% of the benefits of  big troika products for 20% of cost of less and without millions of lines of Java code, an army of consultants and IT managers and annual conferences for big brass.  

In other words "something is rotten in the state of Denmark."  (Hamlet Quotes)

The role of scripting languages

Scripting languages beat Java in area of monitoring hands down and if a monitoring product is written in a scripting language this should be considered to be a strategic advantage.  Advantage that is worth to fight for.

Scripting languages beat Java in the area of monitoring hands down and if a monitoring product is written in a scripting language and/or is extendable using scripting language this should be considered to be a strategic advantage.  Advantage that is worth fighting for.

First of all because codebase is more maintainable and flexible. Integration of plug-ins written in the same scripting language is simpler. Debugging problems is much simpler. Everything is simpler because scripting language is a higher level language then Java or C#.  But at the same time I would like to warn that open source is not a panacea and it has its own (often hidden) costs and pitfalls. In a corporate environment other things equal you are better off with an open source solution behind which there is at least one start-up.  Badly configured or buggy monitoring package can be a big security risk. In no way that means that, say, Tivoli installations in real world are secure, but they are more obscure and security via obscurity works pretty well in a real world ;-)

Let's reiterate the key problems with monster, "enterprise ready", packages:

Architectural Issues

If you are designing a monitoring solution you need to solve almost a dozen  of pretty complex design problem. The ingenuity and flexibility the solution for each of those problems represent the quality of architecture. Among those that we consider the most important are:

  1. Probe architecture.  Probe architecture should provide a simple and flexible way to integrate existing capability of the system (especially existing system utilities including classic Unix utilities ) and convert then into usable alerts. Perl is the simplest way to achieve that as it blends very well into Unix environment and is often is used by system administrators for other purposes, so they do not need to learn yet another language. Probes can communicate two major things:

    Often the interface with the "mothership" is delegated to a special agent (adapter in Tivoli terminology) which contains all the complex machinery necessary for transmission of  event to the event server using some secure or not very secure protocol. In this case probes communicate with the agent. In the simplest case it can be syslogd daemon, SMTP daemon  of simple HTTP-client (id HTTP is used for communication with the mothership.

  2. The structure of the event. This structure of event should be convenient for transmitting of  information from the probe and usually consist of a certain number of predefined fields (hostname, timestamp, name of the probe, etc)  and any number of user definable fields. Generally C-structure based events are flexible enough for description of a large variety of events and also convenient for representing events hierarchically so that you can reuse more basic events for creation of derivatives (inheritance).  The ability of create new event using inheritance is really convenient. In this sense BAROC is not that bad (although fixed length strings sucks badly and should be replaced with variable length strings.   Description of event also should provide for default values (like in BAROC) and possibly tag fields can be ignored in duplicate detection.
  3. Protocol for communication between agents (and the set of probes on the endpoint) and "mothership". The reliability and the cost of communicating between probes and "mothership"  are important.  Reuse of existing protocol such as HTTP, SMTP, SYSLOG or SNMP or some combination provides some important advantages over the reinventing the wheel. In simplest case existing protocol like syslog and SMTP can be used. Actually SMTP proved to be an attractive option as it already exists on most severs, has built-in buffering and fail-over capabilities  satisfy almost all the requirements for transferring events to the mothership. Flexible email-clients with scripting capabilities (like IBM Lotus Notes and Microsoft  Outlook) can also used to message consoles and they are far superior to a typical event console provided with major products. They can be adapted to provide an ability to react of typical messages.

    In the simplest case the agent can be a stand alone executable that is invoked by each probe via pipe ("send event" type of the agent).  In this case HTML/XML  based protocols are natural (albeit more complex and more difficult to parse then necessary), although SMTP-style keyword-value pairs are also pretty competitive and much simpler. The only problem is long, multiline values, but here the boxy of smtp message can be used instead of extended headers.  Unix also provides the necessary syntax in "here" documents. 

    For efficiency an agent can be coded in C, although on modern machines this is not strictly necessary. In case of HTML any command like browser like lynx can be used as a "poor man agent". In this case the communication with the server needs to be organized via forms.

    I would like to stress that SMTP mail, as imperfect as it is, proved to be a viable communication channel for transmitting events from probes to the "mothership" and then distributing them to interested parties. 

  4. The protocol for delivery of probes to remote locations and running them ( protocols like ssh can be used both for delivery and for delivery and execution as is the case in so called "agentless" design)
  5. Aggregation and pre-filtering of events Those are the simplest type of correlation and due to its important it should be considered separately and designed and implemented on a different level than full fledged correlation solution. Here regular expression capabilities are more then enough and you do not need anything more complex. The common solution, used, for example, in Tivoli is to use  gateways for this purpose.  Gateways can be just another instance of the same "master system" or different more specialized version".

    One simple and effective way of aggregation is converting events into "tickets": groups of events that corresponds to a serviceable entity (for example a server)

  6. Event correlation engine This engine should provide a flexible way to filter and correlate events.  This is a pretty complex part of the monitoring solution as correlation engine operated  on the "window" of current events and that windows should be constantly updated and provide view of certain number of past events in a round robin fashion.  Perl arrays are a good approximation of functionality required for such an  event window (updatable slots, the order is important, there should be capability of deletion after certain amount of time even if the event was not displaced by more current events. The simplest correlation engines are usually SQL based and they operate against a special database that is totally memory based.  More complex are Prolog-based. I do not see why a scripting language like Perl cannot be used as correlation engine with a proper library.
  7. The way to schedule and run remote probes with the ability to "rerun failed only" (can be done via local scheduling and, say, ssh protocol or on the local host with possibility of remote updates of schedules, or remote scheduling or some combination (for example remote schedule can be generated for the next 24 hours, but "master schedule" from which it is derives can be maintained on the mothership to cut complexity and simplify maintenance.  
  8. The sub-architecture of collecting information from probes and displaying them on both status of the systems (dashboard) and the events log.  Typically Webservers is used for both dashboard and for event log but there are big differences between systems in implementation details. The simplest event log can be implemented via SMTP browser. And typically SMTP browsers are more flexible that many more specialized solutions. This is actually a strong argument for using SMTP messages format.  For dashboards most advanced monitoring packages now use AJAX, some use Java, etc.  Actually   can serve an a source of inspiration for flexible and robust dashboard.
  9. The way of forwarding events information to the "action scripts" or other systems.    That's really determine the flexibility of the system as in the current enterprise environment no systems can fill all needs.  So ability to play nice both on horizontal and vertical integration levels is really important.  "Ticker based" system in which agent or cron script send "tickers" to the mothership proved to be flexible and powerful.  Even a half-dozen of simple checks results (which can be implemented via a single probe, for example /etc/cron.hourly/  with results send to the LOGHOST server can provide pretty decent level of OS monitoring if this is done along with syslog analysis (and not as an isolated activity).

Those question make sense for users too: if you are able to answer those questions for a particular monitoring solution that means that you pretty much understand the particular system architecture.

Not all components of the architecture need to be implemented. The most essential are probes. At the beginning everything else can be reused via available subsystems/protocols. Typically the first probe implemented and monitoring disk free space ;-) But even is you run pretty complex applications (for example LAMP stack) you can assemble your own monitoring solution just by integrating of ssh, custom shell/Perl/Python scripts (some can be adapted from existing solutions, for example from mon) and Apache server.  Basic HTML tables serve well in this respect as a simple but effective dashboard, and are easy to generate, especially from Perl.  SSH proved to adequate as a agent and data delivery mechanism. You can even run proves via ssh (so called agentless solution), but this solution has an obvious drawback in comparison from running the from cron -- if the server is overloaded of ssh daemon malfunctions the only thing you can say that you can't connect.  But other protocols such as syslog might still be operative and prove that use them can still deliver useful information.  If you run you probes from say /etc/cron.hourly  (very few probes need to be run more often because in large organization, like in dinosaurs, the reaction is very slow, and nothing can be done in less then an hour ;-)  you can automatically switch to syslog delivery if for example you  ssh delivery does not work. Such adaptive delivery mechanism when the best channel of delivery of "tick" information is determined on the fly is more resilient.

The simples script that can run probes sequentially and can be called  from cron can look something like this:

let $POLLING_INTERVAL=60 # 1 minute sleeping interval between probes.

for probe in /usr/local/monitor/probes/* ; do

   $probe >> /tmp/probe_dir/tick # execute probe and send output to a named pipe

   sleep $POLLING_INTERVAL # sleep interval should be specified in  seconds


scp /tmp/probe_dir/tick $LOGHOST/tmp/probes_collector/$HOSTNAME

Another approach is to "inject" each server local crontab with necessary entries once a day and rely on local atd daemon for scheduling. This offloads large part of scheduling load from the "mothership" and at the same time has enough flexibility (some local cron scripts can be mini-schedulers in their own right).

As for representation of the results on the "mothership" server, typically local probes can be made capable generating HTML and submitting it as reply to some form to the Web server running on the mothership, which performs additional rendering and maintenance of history, trends, etc (see  for inspiration).  Creating  a convenient event viewer and dashboard is a larger and more complex task, but basic functionality can achieved without too much effort using apache, off-the shelf SMTP email Web browser (used as event viewer) and some SCI scripts. Again adaptability and programmability are much more important then fancy capabilities.

Adaptability and programmability are much more important then fancy capabilities.

For example you can write a Perl script that generates a HTML table which contains the status of your devices. In such a table color bars can represent the status of the server ( for example, Green=GOOD : Yellow=LATENCY >100ms : Red=UNREACHABLE). See Set up customized network monitoring with Perl. I actually like very much the design of  interface and consider it to be a good prototype for generic system monitoring, as it is customizable and fits the need of server monitoring reasonably well. For example, the concept of portfolios is directly transferable to the concept of groups of servers or locations. 

Similarly any Web-mail implementation represents an almost complete implementation of the event log. If it is written in a scripting language it can be gradually adapted to the needs (instead of trying to reinvent the bicycle and writing the event log software from scratch). I would like to reiterate it again that this is a very strong argument for SMTP-based or SMTP compatible/convertible structure of events, for example, sequence of lines with structure

keyword: value

until blank line and then text part of the message.   

Using paradigm of small reusable components are the key to creation of flexible monitoring system. Even in Windows environment you now can do wonders using Cygwin, or free Microsoft analog called "Linux for Windows" ( SFU 3.5. ).  SSH solves pretty complex problem of component delivery and updates over secure channel, so other things equal it might be preferable to installation of often buggy and insecure (and that includes many misconfigured Tivoli installations) local agents. Actually this is not completely true: local installation of Perl can serve as a very powerful local agent with probes scripts sending information, for example to Web server. And Perl is installed by default on all major Unixes and Linux. In the most primitive way refreshing of information from probes can be implemented as automatic refresh of HTML pages in frames. But there are multiple open source monitoring packages were people worked on refining those ideas for several years and you need critically analyze them and select the package that is most suitable for you.

Still simplicity pays great dividends  in monitoring as you can add your own customarization with much less efforts.

Simplicity pays great dividends  in monitoring as you can add your own customarization with much less efforts and without spending inordinate amount of time studying obscure details of excessively complex architecture

I would recommend to start with a very simple package written in Perl (which every sysadmin should know ;-) and later when you get understanding of issues and compromises inherent in the design of monitoring for your particular environment (which can deviate from a typical in a number of ways) you can move up in complexity. Return on investment in fancy graphs is usually less then expected after first two or three days (outside presentations to executives), but your mileage may vary. If you need graphic output then you definitely need a more complex package that does the necessary heavy lifting for you. It does not make much sense to reinvent the bicycle again and again but in case you need usually a spreadsheet has the ability to create complex graphs from tables and some spreadsheets are highly programmable.

I would recommend to start with a very simple package written in Perl (which every Unix sysadmin should know ;-) and later when you get understanding of issues and compromises inherent in the design you can move up in complexity.  

Open source packages show great promise in monitoring and in my opinion can compete with packages from traditional vendors in small and medium size enterprise space. The only problematic area is the correlation of events but even here you can do quite a lot by simply using capabilities of manipulation of  "event window" by any SQL database (preferably memory based database).

The key question of adopting an open source package is deciding whether it can satisfy you r needs and have architecture that you consider logical enough to work with. This requirement translates into amount of time and patience necessary to evaluate them.  I hope that this page (and relevant subpages) might provide some starting points and hints on where to look.  Also with AJAX the flexibility and quality of open source Web server based monitoring consoles dramatically increased.  Again, for the capabilities of the AJAX technology you can look at

Even if the company anticipates getting a commercial product, creating a prototype using an open source tools might pay off in the major way, giving the ability to cut though the thick layer of vendor hype into the actual capabilities of a particular commercial  application.  Even in production environment the simplicity and flexibility can compensate for less polished interface and lack of certain more complex capabilities, so I would like to stress it again that in this area open source tools looks very competitive to complex and expensive commercial tools like Tivoli. 

The tales about overcomplexity of Tivoli product line are simply legendary and we will not repeat them here. But one lesson emerges: simple applications can complete with very complex commercial monitoring solutions for one simple reason: overcomplexity undermines both reliability and flexibility, the two major criteria for monitoring application.  Consider criteria for the monitoring application to be close to criteria for the handguns or rifles: it should not jam in sand and water.

Overcomplexity undermines both reliability and flexibility

Classification of open source  monitoring packages based on their complexity

If you use ticker based architecture in which individual probes run from cron script on each individual server and push "ticks" to the "mothership" (typically LOGHOST server) were it is process by special "electrocardiogram" script each hour (or each 15 min if you inpatient ;-),  you can write a usable variant  with half a dozen of most useful checks (uptime check for overload, DF check for missing mounts and free space, log check for strange of too many messages per interval, check of status for a couple of critical daemons, and a couple of others) in say 40-80 hours in shell. Probably less if you use Perl (you can also use both writing probes in shell and electrocardiogram script in Perl). Probes are generally should be written in uniform style and use common library of functions. This is easier done in Perl but if the server is heavily loaded such probes might not run.   ticks can be displayed via web server, providing a primitive dashboard.  

If you are good programmer you probably can write such system in one evening, but as Russians say The appetite comes during a meal…   and this system need to evolve for at least a week to be really usable and satisfy real needs.   BTW to write a good, flexible "filesystem free space' script is a real challenge, despite the fact that the task is really simple.  The simplest way to start might be to rely on individual "per server" manifests (edited outputs of df from the server), which specify which filesystems to check and what are upper limits and one "universal" config file which deals with default percentages that are uniform across the servers. 

There are several interesting open source monitoring products each of which tries "to reinvent the bicycle" in a different way (and/or convert it into moped ;-)  by adding heartbeat, graphic and statistical packages, AJAX, improving the security  and storing events in backend database.  But again the essence of monitoring is reliability and flexibility, not necessary the availability of eye popping excel-style graphs. 

Monitoring Unix system is a tool by sysadmins for sysadmins and should be useful primarily for this purpose,  not for the occasional demonstration to the vice-president for the IT of the particular company. That means that even within open source monitoring system not all systems belong to the same category and we need to distinguish between them based both on the implementation language and complexity of the codebase.

 Like in boxing there should be several categories (usage of scripting language and the size of codebase if the main create used here):

Weight Examples
Featherweight mon (Perl)
Lightweight Spong (Perl)
Middleweight Big Sister (Perl)
Super middleweight OpenSMART (Perl), ZABBIX (PHP, C, agent and agentless)
Light heavyweight Nagios (C, agentless, primitive agent support), OpenNMS (Java)
Heavyweight Tivoli (old line of products in mostly mainly C++, new line is mostly Java), OpenView, Unicenter

Some useful features in monitoring packages

One very useful feature is the concept of  server groups -- servers that have similar characteristics. That gives you an ability to perform group probes and/or configuration files changes for the whole group as a single operation. Groups are actually sets and standard set operations can be performed on them. For example HTTP servers evolved into highly specialized class of servers and can benefit from less generic scripts to monitor key components, but in your organization the can belong to a larger group of RHEL 6.8 servers.   The same is true for DNS servers, mail servers and database servers.

Another useful feature is hierarchical HTML pages layout that provides a nice general picture (in most primitive form using 3-5 animated icons for "big picture" (OK, warnings, problems, serious problems, dead) with the ability of more detailed multilevel drilling "in depth" for each icon. Generic groupings of servers can include, for example: 

Dr. Nikolai Bezroukov

Top Visited
Past week
Past month


Old News ;-)

Always listen to experts. They'll tell you what can't be done, and why. Then do it.

-- Robert Heinlein

[Feb 07, 2019] Installing Nagios-3.4 in CentOS 6.3 LinTut

Feb 07, 2019 |

Nagios is an opensource software used for network and infrastructure monitoring . Nagios will monitor servers, switches, applications and services . It alerts the System Administrator when something went wrong and also alerts back when the issues has been rectified.

View also: How to Enable EPEL Repository for RHEL/CentOS 6/5

View also: How to Enable EPEL Repository for RHEL/CentOS 6/5
yum install nagios nagios-devel nagios-plugins* gd gd-devel httpd php gcc glibc glibc-common

Bydefualt on doing yum install nagios, in cgi.cfg file, authorized user name nagiosadmin is mentioned and for htpasswd file /etc/nagios/passwd file is used.So for easy steps I am using the same name.
# htpasswd -c /etc/nagios/passwd nagiosadmin

Check the below given values in /etc/nagios/cgi.cfg
nano /etc/nagios/cgi.cfg

For provoding the access to nagiosadmin user in http, /etc/httpd/conf.d/nagios.conf file exist. Below is the nagios.conf configuration for nagios server.
cat /etc/http/conf.d/nagios.conf
# Last Modified: 11-26-2005
# This file contains examples of entries that need
# to be incorporated into your Apache web server
# configuration file. Customize the paths, etc. as
# needed to fit your system.

ScriptAlias /nagios/cgi-bin/ "/usr/lib/nagios/cgi-bin/"
# SSLRequireSSL
Options ExecCGI
AllowOverride None
Order allow,deny
Allow from all
# Order deny,allow
# Deny from all
# Allow from
AuthName "Nagios Access"
AuthType Basic
AuthUserFile /etc/nagios/passwd
Require valid-user

Alias /nagios "/usr/share/nagios/html"
# SSLRequireSSL
Options None
AllowOverride None
Order allow,deny
Allow from all
# Order deny,allow
# Deny from all
Allow from
AuthName "Nagios Access"
AuthType Basic
AuthUserFile /etc/nagios/passwd
Require valid-user

Start the httpd and nagios /etc/init.d/httpd start /etc/init.d/nagios start [warn]Note: SELINUX and IPTABLE are disabled.[/warn] Access the nagios server by http://nagios_server_ip-address/nagios Give the username = nagiosadmin and password which you have given to nagiosadmin user.

[Jan 31, 2019] Troubleshooting performance issue in CentOS-RHEL using collectl utility The Geek Diary

Jan 31, 2019 |

Troubleshooting performance issue in CentOS/RHEL using collectl utility

By admin

Unlike most monitoring tools that either focus on a small set of statistics, format their output in only one way, run either interactively or as a daemon but not both, collectl tries to do it all. You can choose to monitor any of a broad set of subsystems which currently include buddyinfo, cpu, disk, inodes, InfiniBand, lustre, memory, network, nfs, processes, quadrics, slabs, sockets and tcp.

Installing collectl

The collectl community project is maintained at as well as provided in the Fedora community project. For Red Hat Enterprise Linux 6 and 7, the easiest way to install collectl is via the EPEL repositories (Extra Packages for Enterprise Linux) maintained by the Fedora community.

Once set up, collectl can be installed with the following command:

# yum install collectl

The packages are also available for direct download using the following links:

RHEL 5 x86_64 (available in the EPEL archives)
RHEL 6 x86_64
RHEL 7 x86_64

General usage of collectl

The collectl utility can be run manually via the command line or as a service. Data will be logged to /var/log/collectl/*.raw.gz . The logs will be rotated every 24 hours by default. To run as a service:

# chkconfig collectl on       # [optional, to start at boot time]
# service collectl start
Sample Intervals

When run manually from the command line, the first Interval value is 1 . When running as a service, default sample intervals are as show below. It might sometimes be desired to lower these to avoid averaging, such as 1,30,60.

# grep -i interval /etc/collectl.conf 
#Interval =     10
#Interval2 =    60
#Interval3 =   120
Using collectl to troubleshoot disk or SAN storage performance

The defaults of 10s for all but process data which is collected at 60s intervals are best left as is, even for storage performance analysis.

The SAR Equivalence Matrix shows common SAR command equivalents to help experienced SAR users learn to use Collectl. The following example command will view summary detail of the CPU, Network and Disk from the file /var/log/collectl/HOSTNAME-20190116-164506.raw.gz :

# collectl -scnd -oT -p HOSTNAME-20190116-164506.raw.gz
#         <----CPU[HYPER]-----><----------Disks-----------><----------Network---------->
#Time     cpu sys inter  ctxsw KBRead  Reads KBWrit Writes   KBIn  PktIn  KBOut  PktOut 
16:46:10    9   2 14470  20749      0      0     69      9      0      1      0       2 
16:46:20   13   4 14820  22569      0      0    312     25    253    174      7      79 
16:46:30   10   3 15175  21546      0      0     54      5      0      2      0       3 
16:46:40    9   2 14741  21410      0      0     57      9      1      2      0       4 
16:46:50   10   2 14782  23766      0      0    374      8    250    171      5      75 

The next example will output the 1 minute period from 17:00 – 17:01.

# collectl -scnd -oT --from 17:00 --thru 17:01 -p HOSTNAME-20190116-164506.raw.gz
#         <----CPU[HYPER]-----><----------Disks-----------><----------Network---------->
#Time     cpu sys inter  ctxsw KBRead  Reads KBWrit Writes   KBIn  PktIn  KBOut  PktOut 
17:00:00   13   3 15870  25320      0      0     67      9    251    172      6      90 
17:00:10   16   4 16386  24539      0      0    315     17    246    170      6      84 
17:00:20   10   2 14959  22465      0      0     65     26      5      6      1       8 
17:00:30   11   3 15056  24852      0      0    323     12    250    170      5      69 
17:00:40   18   5 16595  23826      0      0    463     13      1      5      0       5 
17:00:50   12   3 15457  23663      0      0     57      9    250    170      6      76 
17:01:00   13   4 15479  24488      0      0    304      7    254    176      5      70

The next example will output Detailed Disk data.

# collectl -scnD -oT -p HOSTNAME-20190116-164506.raw.gz

### RECORD    7 >>> tabserver <<< (1366318860.001) (Thu Apr 18 17:01:00 2013) ###

# User  Nice   Sys  Wait   IRQ  Soft Steal  Idle  CPUs  Intr  Ctxsw  Proc  RunQ   Run   Avg1  Avg5 Avg15 RunT BlkT
     8     0     3     0     0     0     0    86     8   15K    24K     0   638     5   1.07  1.05  0.99    0    0

#          <---------reads---------><---------writes---------><--------averages--------> Pct
#Name       KBytes Merged  IOs Size  KBytes Merged  IOs Size  RWSize  QLen  Wait SvcTim Util
sda              0      0    0    0     304     11    7   44      44     2    16      6    4
sdb              0      0    0    0       0      0    0    0       0     0     0      0    0
dm-0             0      0    0    0       0      0    0    0       0     0     0      0    0
dm-1             0      0    0    0       5      0    1    4       4     1     2      2    0
dm-2             0      0    0    0     298      0   14   22      22     1     4      3    4
dm-3             0      0    0    0       0      0    0    0       0     0     0      0    0
dm-4             0      0    0    0       0      0    0    0       0     0     0      0    0
dm-5             0      0    0    0       0      0    0    0       0     0     0      0    0
dm-6             0      0    0    0       0      0    0    0       0     0     0      0    0
dm-7             0      0    0    0       0      0    0    0       0     0     0      0    0
dm-8             0      0    0    0       0      0    0    0       0     0     0      0    0
dm-9             0      0    0    0       0      0    0    0       0     0     0      0    0
dm-10            0      0    0    0       0      0    0    0       0     0     0      0    0
dm-11            0      0    0    0       0      0    0    0       0     0     0      0    0

# KBIn  PktIn SizeIn  MultI   CmpI  ErrsI  KBOut PktOut  SizeO   CmpO  ErrsO
   253    175   1481      0      0      0      5     70     79      0      0
Commonly used options

These generate summary, which is the total of ALL data for a particular type

These generate detail data, typically but not limited to the device level

The most useful switches are listed here

Final Thoughts

Performance Co-Pilot (PCP) is the preferred tool for collecting comprehensive performance metrics for performance analysis and troubleshooting. It is shipped and supported in Red Hat Enterprise Linux 6 & 7 and is the preferred recommendation over Collectl or Sar/Sysstat. It also includes conversion tools between its own performance data and Collectl & Sar/Syststat.

[Oct 30, 2018] So how many ibm competitors will want to use ansible now?

Oct 30, 2018 |

Anonymous Coward , 17 hrs

oops there goes ansible

So how many ibm competitors will want to use ansible now?

[Oct 14, 2018] Polling is normally the safest and simplest paradigm, though, because the standard thing is when a file changes, do this

Oct 14, 2018 |

raymorris ( 2726007 ) , Sunday May 27, 2018 @03:35PM ( #56684542 ) Journal

inotify / fswatch ( Score: 5 , Informative)

>. Files don't generally call you, for example, you have to poll.

That's called inotify. If you want to be compatible with systems that have something other than inotify, fswatch is a wrapper around various implementations of "call me when a file changes".

Polling is normally the safest and simplest paradigm, though, because the standard thing is "when a file changes, do this". Polling / waiting makes that simple and self-explanatory:

while tail file

The alternative, asynchronously calling the function like this has a big problem:

when file changes
do something

The biggest problem is that a file can change WHILE you're doing something(), meaning it will re-start your function while you're in the middle of it. Re-entrancy carries with it all manner of potential problems. Those problems can be handled of you really know what you're doing, you're careful, and you make a full suite of re-entrant integration tests. Or you can skip all that and just use synchronous io, waiting or polling. Neither is the best choice in ALL situations, but very often simplicity is the best choice.

[Nov 12, 2017] Installing Nagios 3.4.4 On CentOS 6.3

Nov 12, 2017 |

Installing Nagios 3.4.4 On CentOS 6.3 Introduction

Nagios is a monitoring tool under GPL licence. This tool lets you monitor servers, network hardware (switches, routers, ...) and applications. A lot of plugins are available and its big community makes Nagios the biggest open source monitoring tool. This tutorial shows how to install Nagios 3.4.4 on CentOS 6.3.


After installing your CentOS server, you have to disable selinux & install some packages to make nagios work.

To disable selinux, open the file: /etc/selinux/config

# vi /etc/selinux/config

# This file controls the state of SELinux on the system.
# SELINUX= can take one of these three values:
#     enforcing - SELinux security policy is enforced.
#     permissive - SELinux prints warnings instead of enforcing.
#     disabled - No SELinux policy is loaded.
SELINUX=permissive // change this value to disabled
# SELINUXTYPE= can take one of these two values:
#     targeted - Targeted processes are protected,
#     mls - Multi Level Security protection.

Now, download all packages you need:

# yum install gd gd-devel httpd php gcc glibc glibc-common

Nagios Installation

Create a directory:

# mkdir /root/nagios

Navigate to this directory:

# cd /root/nagios

Download nagios-core & plugin:

# wget
# wget

Untar nagios core:

# tar xvzf nagios-3.4.4.tar.gz

Go to the nagios dir:

# cd nagios

Configure before make:

# ./configure

Make all necessary files for Nagios:

# make all


# make install

# make install-init

# make install-commandmode

# make install-config

# make install-webconf

Create a password to log into the web interface:

# htpasswd -c /usr/local/nagios/etc/htpasswd.users nagiosadmin

Start the service and start it on boot:

# chkconfig nagios on
# service nagios start

Now, you have to install the plugins:

# cd ..
# tar xvzf nagios-plugins-1.4.15.tar.gz
# cd nagios-plugins-1.4.15
# ./configure
# make
# make install

Start the apache service and enable it on boot:

# service httpd start
# chkconfig httpd on

Now, connect to your nagios system:

http://Your-Nagios-IP/nagios and enter login : nagiosadmin & password you have chosen above.

And after the installation ?

After the installation you have to configure all your host & services in nagios configuration files.This step is performed in command line and is complicated, so I recommand to install tool like Centreon, that is a beautiful front-end to add you host & services.

To go further, I recommend you to read my article on Nagios & Centreon monitoring .

[Nov 12, 2017] How to Install Nagios 4 in Ubuntu and Debian

Nov 12, 2017 |


  1. Debian 9 Minimal Installation
  2. Ubuntu 16.04 Minimal Installation
Step 1: Install Pre-requirements for Nagios

1. Before installing Nagios Core from sources in Ubuntu or Debian , first install the following LAMP stack components in your system, without MySQL RDBMS database component, by issuing the below command.

# apt install apache2 libapache2-mod-php7.0 php7.0

2. On the next step, install the following system dependencies and utilities required to compile and install Nagios Core from sources, by issuing the follwoing command.

# apt install wget unzip zip  autoconf gcc libc6 make apache2-utils libgd-dev
Step 2: Install Nagios 4 Core in Ubuntu and Debian

3. On the first step, create nagios system user and group and add nagios account to the Apache www-data user, by issuing the below commands.

# useradd nagios
# usermod -a -G nagios www-data

4. After all dependencies, packages and system requirements for compiling Nagios from sources are present in your system, go to Nagios webpage and grab the latest version of Nagios Core stable source archive by issuing the following command.

# wget

5. Next, extract Nagios tarball and enter the extracted nagios directory, with the following commands. Issue ls command to list nagios directory content.

# tar xzf nagios-4.3.4.tar.gz 
# cd nagios-4.3.4/
# ls
List Nagios Content

List Nagios Content

6. Now, start to compile Nagios from sources by issuing the below commands. Make sure you configure Nagios with Apache sites-enabled directory configuration by issuing the below command.

# ./configure --with-httpd-conf=/etc/apache2/sites-enabled

7. In the next step, build Nagios files by issuing the following command.

# make all

8. Now, install Nagios binary files, CGI scripts and HTML files by issuing the following command.

# make install

9. Next, install Nagios daemon init and external command mode configuration files and make sure you enable nagios daemon system-wide by issuing the following commands.

# make install-init
# make install-commandmode
# systemctl enable nagios.service

10. Next, run the following command in order to install some Nagios sample configuration files needed by Nagios to run properly by issuing the below command.

# make install-config

11. Also, install Nagios configuration file for Apacahe web server, which can be fount in /etc/apacahe2/sites-enabled/ directory, by executing the below command.

# make install-webconf

12. Next, create nagiosadmin account and a password for this account necessary by Apache server to log in to Nagios web panel by issuing the following command.

# htpasswd -c /usr/local/nagios/etc/htpasswd.users nagiosadmin

13. To allow Apache HTTP server to execute Nagios cgi scripts and to access Nagios admin panel via HTTP, first enable cgi module in Apache and then restart Apache service and start and enable Nagios daemon system-wide by issuing the following commands.

# a2enmod cgi
# systemctl restart apache2
# systemctl start nagios
# systemctl enable nagios

14. Finally, log in to Nagios Web Interface by pointing a browser to your server's IP address or domain name at the following URL address via HTTP protocol. Log in to Nagios with nagiosadmin user the password setup with htpasswd script.


[Oct 31, 2017] Nagios on Debian primer by Tom Ryder

Jan 26, 2012 |

Nagios is useful for monitoring pretty much any kind of network service, with a wide variety of community-made plugins to test pretty much anything you might need. However, its configuration and interface can be a little bit cryptic to initiates. Fortunately, Nagios is well-packaged in Debian and Ubuntu and provides a basic default configuration that is instructive to read and extend.

There's a reason that a lot of system administrators turn into monitoring fanatics when tools like Nagios are available. The rapid feedback of things going wrong and being fixed and the pleasant sea of green when all your services are up can get addictive for any halfway dedicated administrator.

In this article I'll walk you through installing a very simple monitoring setup on a Debian or Ubuntu server. We'll assume you have two computers in your home network, a workstation on and a server on , and that you maintain a web service of some sort on a remote server, for which I'll use .

We'll install a Nagios instance on the server that monitors both local services and the remote webserver, and emails you if it detects any problems.

For those not running a Debian-based GNU/Linux distribution or perhaps BSD, much of the configuration here will still apply, but the initial setup will probably be peculiar to your ports or packaging system unless you're compiling from source.

Installing the packages

We'll work on a freshly installed Debian Stable box as the server, which at the time of writing is version 6.0.3 "Squeeze". If you don't have it working already, you should start by installing Apache HTTPD:

# apt-get install apache2

Visit the server on and check that you get the "It works!", and that should be all you need. Note that by default this installation of Apache is not terribly secure, so you shouldn't allow access to it from outside your private network until you've locked it down a bit, which is outside the scope of this article.

Next we'll install the nagios3 package, which will include a default set of useful plugins, and a simple configuration. The list of packages it needs to support these is quite long so you may need to install a lot of dependencies, which apt-get will manage for you.

# apt-get install nagios3

The installation procedure will include requesting a password for the administration area; provide it with a suitable one. You may also get prompted to configure a workgroup for the samba-common package; don't worry, you aren't installing a samba service by doing this, it's just information for the smbclient program in case you want to monitor any SMB/CIFS services.

That should provide you with a basic self-monitoring Nagios setup. Visit in your browser to verify this; use the username nagiosadmin and the password you gave during the install process. If you see something like the below, you're in business; this is the Nagios web reporting and administration panel.

The Nagios administration area's front page

The Nagios administration area's front page Default setup

To start with, click the Services link in the left menu. You should see something like the below, which is the monitoring for localhost and the service monitoring that the packager set up for you by default:

Default Nagios monitoring hosts and services

Default Nagios monitoring hosts and services

Note that on my system, monitoring for the already-existing HTTP and SSH daemons was automatically set up for me, along with the default checks for load average, user count, and process count. If any of these pass a threshold, they'll turn yellow for WARNING, and red for CRITICAL states.

This is already somewhat useful, though a server monitoring itself is a bit problematic because of course it won't be able to tell you if it goes completely down. So for the next step, we're going to set up monitoring for the remote host , which means firing up your favourite text editor to edit a few configuration files.

Default configuration

Nagios configuration is at first blush a bit complex, because monitoring setups need to be quite finely-tuned in order to be useful long term, particularly if you're managing a large number of hosts. Take a look at the files in /etc/nagios3/conf.d .

# ls /etc/nagios3/conf.d

You can actually arrange a Nagios configuration any way you like, including one big well-ordered file, but it makes some sense to break it up into sections if you can. In this case, the default setup includes the following files:

This isn't my favourite method of organising Nagios configuration, but it'll work fine for us. We'll start by defining a remote host, and add services to it.

Testing services

First of all, let's check we actually have connectivity to the host we're monitoring from this server for both of the services we intend to check; ICMP ECHO (PING) and HTTP.

$ ping -n -c 1
PING ( 56(84) bytes of data.
64 bytes from icmp_req=1 ttl=243 time=168 ms
--- ping statistics --- 1 packets transmitted, 1 received,
0% packet loss, time 0ms rtt min/avg/max/mdev = 168.700/168.700/168.700/0.000 ms

$ wget -O - | grep -i found
tom@novus:~$ wget -O -
--2012-01-26 21:12:00--
Resolving, 2001:500:88:200::10
Connecting to||:80... connected.
HTTP request sent, awaiting response... 302 Found

All looks well, so we'll go ahead and add the host and its services.

Defining the remote host

Write a new file in the /etc/nagios3/conf.d directory called www.example.com_nagios2.cfg , with the following contents:

define host {
    use        generic-host

The first stanza of localhost_nagios2.conf looks very similar to this, indeed, it uses the same host template, generic-host . All we need to do is define what to call the host, and where to find it.

However, in order to get it monitoring appropriate services, we might need to add it to one of the already existing groups. Open up hostgroups_nagios2.cfg , and look for the stanza that includes hostgroup_name http-servers . Add to the group's members, so that that stanza looks like this:

# A list of your web servers
define hostgroup {
    hostgroup_name  http-servers
    alias           HTTP servers
    members         localhost,

With this done, you need to restart the Nagios process:

# service nagios3 restart

If that succeeds, you should notice under your Hosts and Services section is a new host called "", and it's being monitored for HTTP. At first, it'll be PENDING, but when the scheduled check runs, it should come back (hopefully!) as OK.

[May 4, 2014] 20 Command Line Tools to Monitor Linux Performance By Ravi Saive

April 27, 2014 |

... ... ...

6. Htop – Linux Process Monitoring

Htop is a much advanced interactive and real time Linux process monitoring tool. This is much similar to Linux top command but it has some rich features like user friendly interface to manage process, shortcut keys, vertical and horizontal view of the processes and much more. Htop is a third party tool and doesn't included in Linux systems, you need to install it using YUM package manager tool. For more information on installation read our article below.

7. Iotop – Monitor Linux Disk I/O

Iotop is also much similar to top command and Htop program, but it has accounting function to monitor and display real time Disk I/O and processes. This tool is much useful for finding the exact process and high used disk read/writes of the processes.

... ... ...

9. IPTraf – Real Time IP LAN Monitoring

IPTraf is an open source console-based real time network (IP LAN) monitoring utility for Linux. It collects a variety of information such as IP traffic monitor that passes over the network, including TCP flag information, ICMP details, TCP/UDP traffic breakdowns, TCP connection packet and byne counts. It also gathers information of general and detaled interface statistics of TCP, UDP, IP, ICMP, non-IP, IP checksum errors, interface activity etc.

10. Psacct or Acct – Monitor User Activity

psacct or acct tools are very useful for monitoring each users activity on the system. Both daemons runs in the background and keeps a close watch on the overall activity of each user on the system and also what resources are being consumed by them.

These tools are very useful for system administrators to track each users activity like what they are doing, what commands they issued, how much resources are used by them, how long they are active on the system etc.

For installation and example usage of commands read the article on Monitor User Activity with psacct or acct

11. Monit – Linux Process and Services Monitoring

Monit is a free open source and web based process supervision utility that automatically monitors and managers system processes, programs, files, directories, permissions, checksums and filesystems.

It monitors services like Apache, MySQL, Mail, FTP, ProFTP, Nginx, SSH and so on. The system status can be viewed from the command line or using it own web interface.

12. NetHogs – Monitor Per Process Network Bandwidth

NetHogs is an open source nice small program (similar to Linux top command) that keeps a tab on each process network activity on your system. It also keeps a track of real time network traffic bandwidth used by each program or application.
NetHogs Linux Bandwidth Monitoring

NetHogs Linux Bandwidth Monitoring

Read More : Monitor Linux Network Bandwidth Using NetHogs

13. iftop – Network Bandwidth Monitoring

iftop is another terminal-based free open source system monitoring utility that displays a frequently updated list of network bandwidth utilization (source and destination hosts) that passing through the network interface on your system. iftop is considered for network usage, what 'top' does for CPU usage. iftop is a 'top' family tool that monitor a selected interface and displays a current bandwidth usage between two hosts.

14. Monitorix – System and Network Monitoring

Monitorix is a free lightweight utility that is designed to run and monitor system and network resources as many as possible in Linux/Unix servers. It has a built in HTTP web server that regularly collects system and network information and display them in graphs. It Monitors system load average and usage, memory allocation, disk driver health, system services, network ports, mail statistics (Sendmail, Postfix, Dovecot, etc), MySQL statistics and many more. It designed to monitor overall system performance and helps in detecting failures, bottlenecks, abnormal activities etc.

... ... ...

[Apr 19, 2013] Monitorix (A Lightweight System and Network) Monitoring Tool for Linux By Ravi Saive

April 17, 2013

Monitorix is witten in Perl, Licenced under GNU monitoring tool. It collects server and network data and display the information in graphs using its own web interface. Monitorix allows to monitor overall system performance and also help in detecting bottlenecks, failures, unwanted long response times and other abnormal activities.

It uses RRDtool to generate graphs and display them using web interface.

This tool is specifically created for monitoring Red Hat, CentOS, Fedora based Linux systems, but can run of ther flavours of Unix too.


  1. System load average, active processes, per-processor kernel usage, global kernel usage and memory allocation.
  2. Monitors Disk drive temperatures and health.
  3. Filesystem usage and I/O activity of filesystems.
  4. Network traffic usage up to 10 network devices.
  5. System services including SSH, FTP, Vsftpd, ProFTP, SMTP, POP3, IMAP, POP3, VirusMail and Spam.
  6. MTA Mail statistics including input and output connections.
  7. Network port traffic including TCP, UDP, etc.
  8. FTP statistics with log file formats of FTP servers.
  9. Apache statistics of local or remote servers.
  10. MySQL statistics of local or remote servers.
  11. Squid Proxy Web Cache statistics.
  12. Fail2ban statistics.
  13. Monitor remote servers (Multihost).
  14. Ability to view statistics in graphs or in plain text tables per day, week, month or year.
  15. Ability to zoom graphs for better view.
  16. Ability to define the number of graphs per row.
  17. Built-in HTTP server.

For a full list of new features and updates, please check out the official feature page.

[Jun 21, 2011] Monitorix

Perl-based project
Monitorix is a lightweight system monitoring tool designed to monitor as many services and system resources as possible. It has been created to be used under production UNIX/Linux servers, but due to its simplicity and small size you may also use it on embedded devices as well. It mainly consists of two programs: a collector called monitorix, which is a Perl daemon that is started automatically like any other system service, and a CGI script called monitorix.cgi.

[Mar 22, 2011] monitoring-nagios-icinga-opsview

[Aug 22, 2010] Introducing Operations Manager on Linux

See also HP Operations Manager and HP Operations Manager 9 (HP OM or OVO) Installation on Red Hat 5.5
Aug 8, 2009 | HP Blogs

HP Operations Manager has long had the ability to monitor Linux servers. We are now getting ready to release a version of Operations Manager that runs on Linux. This complements our existing Operations Manager on Windows (OMW) and Operations Manager on Unix (OMU).

[Aug 21, 2010] Hosted Server Monitoring

Ruby based monitoring system that stresses simplicity and elegance...


"Scout offers customization and extensibility without extra overhead, doing just what you need and then getting out of your way. It's just the kind of app our customers love, a simple solution for a complicated problem."

- Dan Benjamin, Hivelogic

"Scout is the first server monitoring tool to find the right balance of simplicity and flexibility."

- Hampton Catlin, Unspace

"Scout brilliantly eliminates the hassle of manually installing and updating monitoring scripts on each of our servers."

- Nick Pearson, Banyan Theory

"I wrote a Scout plugin in about ten minutes - it's as simple as writing Ruby. And since I'm in love with Ruby, naturally, Scout is my new favorite tool for keeping an eye on my servers."

- Tim Morgan, Tulsa Ruby User Group

"Support is excellent: swift, friendly, and helpful."

- Andrew Stewart, AirBlade Software

"Server performance problems are notoriously hard to anticipate or reproduce. Scout's long memory and clean graphs make it an awesome tool for collecting and analyzing performance metrics of all kinds."

- Lance Ivy, UserVoice

***** [Oct 17, 2009] Open Source Management Options

I think this is one of the best publicly available reviews of three products Nagios, OpenNMS and Zenoss

This is a major review of Open Source products for Network and Systems Management.

The paper is available as a PDF file:

Note that the file is fairly large (18MB).

See also Systems and Network Management Skills 1st Ltd

[Aug 1, 2009] Deploying Nagios in a Large Enterprise Environment, at USENIX LISA '07

Interesting presentation about splitting Nagios into multiple domains (I think that each 100 servers requires separate instance if check are extensive) and using passive checks to avoid bottleneck of "agent less probes". Configuration file generation can be a big help in case servers are similar. Large deployment requires configuration management of Nagios config files. It's interesting how they "reinvented the bicycle" for some concepts like querying of alerts, etc which should be in the enterprise monitoring system from the very beginning :-)

[Apr 29, 2009] Cool Solutions Nagios 3.0 - A Extensible Host and Service Monitoring

Nagios is included into SLES 11 as is supported by Novell. It is installable directly from Novell RPM repository.
Oct 19, 2007 | Novell

Nagios is a popular host and service monitoring tool used by many administrators to keep an eye on their systems.

Since I wrote a basic installation guide in Jan 2006 on Cool Solutions many new versions were published and many Nagios plugins are now available. Because of that I think it's time to write a series of articles here that show you some very interesting solutions. I hope that you find them helpful and that you can use them in your environment. If you are not yet and nagios user I hope that I can inspire you and you give it a try.

I don't want to write here a full documentation about Nagios, I prefer to give you a basic installation guide so you can set it up very easy and play with it yourself. The installation guide will show you how to install Nagios as well as some interesting extensions and how they integrate into each other. During this installation you will make many modifications to the installation that will help to understand how it works, how you can integrate systems and different services. I will also provide some articles about monitoring special services where I describe what they do and what configuration changes are needed. All together should give you a very good overview and documentation on how you can enhance the Nagios installation yourself.

If you would like to read some detailed information about Nagios visit the documentation at the project homepage at or go through my short article from Jan 2006 at

Munin - Trac

Munin the monitoring tool surveys all your computers and remembers what it saw. It presents all the information in graphs through a web interface. Its emphasis is on plug and play capabilities. After completing a installation a high number of monitoring plugins will be playing with no more effort.

Using Munin you can easily monitor the performance of your computers, networks, SANs, applications, weather measurements and whatever comes to mind. It makes it easy to determine "what's different today" when a performance problem crops up. It makes it easy to see how you're doing capacity-wise on any resources.

Munin uses the excellent RRDTool (written by Tobi Oetiker) and the framework is written in Perl, while plugins may be written in any language. Munin has a master/node architecture in which the master connects to all the nodes at regular intervals and asks them for data. It then stores the data in RRD files, and (if needed) updates the graphs. One of the main goals has been ease of creating new plugins (graphs).

This site is a wiki as well as a project management tool. We appreciate any contributions to the documentation. While this is the homepage of the Munin project, we will still make all releases through Sourceforge.

Open Source Enterprise Monitoring Systems by Corey Goldberg

I used Nagios for health/performance monitoring of devices/servers for years at a previous job. It has been a while, and I'm starting to look into this space again. There are a lot more options out there for remote monitoring these days.

Here is what I have found that look good:

Do you know of any others I am missing? I'll update this list if I get replies. The requirement is that there must be an Open Source version of the tool.


aetius said...
OpenNMS. Might be more than you need, but it's fully open source.
Todd said...
Opsview is another one
sysadim guy said...
We use nagios2 installed from ubuntu 804 package.

We are planing to update to nagios 3 wich is available in ubuntu 810.

There are some nice addons like

The best asset for nagios in our case is that it's very easy to developp new plugins. We complement this with some centralized administrative tool which allow us to deploy new plugins or change parameters: cfengine (for *nix) or SCCM 2007 for MS.

Corey Goldberg said...
@sysadim guy:

yea I really like Nagios a lot. I developed the WebInject plugin for it to monitor websites. My plugin is pretty popular:

Still haven't tried Nagios 3 yet

Peter B. said...
I found the following slideshare presentation on monitoring systems very helpful

Also, dude, the webinject forum isn't working: e.g.;action=display;num=1201702796

[Sep 3, 2008] TraffStats 0.11.3 by Klaus Zerwes

Sep 3, 2008 |

About: TraffStats is a monitoring and traffic analysis application that uses SNMP to collect data from any enabled device. It has the ability to generate graphs (using jpgraph) with the option to compare and sum up different devices. It has a multiuser-design with rights-management and support for multiple languages.

[Aug 27, 2008] MUSCLE 4.28 by Jeremy Friesner

About: MUSCLE (Multi User Server Client Linking Environment) is an N-way messaging server and networking API. It includes client-side networking APIs for various languages, including C, C++, C#, Delphi, Java, and Python. MUSCLE lets programs communicate over a network via streams of serialized Message objects. The included server program ("muscled") lets its clients message each other and store information in its server-side hierarchical database. The database supports flexible queries via hierarchical wildcarding, and "live" updates via a subscription mechanism.

Changes: This release compiles again under Win32. A fork() vs forkpty() option has been added to the ChildProcessDataIO class. Directory and FilePathInfo classes have been added. There are other minor changes.

[Jul 17, 2008] fsheal

Useful Perl-script

FSHeal aims to be a general filesystem tool that can scan and report vital "defective" information about the filesystem like broken symlinks, forgotten backup files, and left-over object files, but also source files, documentation files, user documents, and so on.

It will scan the filesystem without modifying anything and reporting all the data to a logfile specified by the user which can then be reviewed and actions taken accordingly.

[Jul 16, 2008] httping 1.2.9 by Folkert van Heusden

About: httping is a "ping"-like tool for HTTP requests. Give it a URL and it will show how long it takes to connect, send a request, and retrieve the reply (only the headers). It can be used for monitoring or statistical purposes (measuring latency).

Changes: Binding to an adapter did not work and "SIGPIPE" was not handled correctly. Both of these problems were fixed.

[Jun 25, 2008] check_oracle_health

About: check_oracle_health is a plugin for the Nagios monitoring software that allows you to monitor various metrics of an Oracle database. It includes connection time, SGA data buffer hit ratio, SGA library cache hit ratio, SGA dictionary cache hit ratio, SGA shared pool free, PGA in memory sort ratio, tablespace usage, tablespace fragmentation, tablespace I/O balance, invalid objects, and many more.

Release focus: Major feature enhancements

Changes: The tablespace-usage mode now takes into account when tablespaces use autoextents. The data-buffer/library/dictionary-cache-hitratio are now more accurate. Sqlplus can now be used instead of DBD::Oracle.

[Jun 11, 2008] check_lm_sensors 3.1.0 by Matteo Corti

About: check_lm_sensors is a Nagios plugin to monitor the values of on-board sensors and hard disk temperatures on Linux systems.

Changes: The plugin now uses the standard Nagios::Plugin CPAN classes, fixing issues with embedded perl.

[May 6, 2008] Ortro 1.3.0 by Luca Corbo

PHP based

About: Ortro is a framework for enterprise scheduling and monitoring. It allows you to easily assemble jobs to perform workflows and run existing scripts on remote hosts in a secure way using ssh. It also tests your Web applications, creates simple reports using queries from databases (in HTML, text, CSV, or XLS), emails them, and sends notifications of job results using email, SMS, Tibco Rvd, Tivoli postemsg, or Jabber.

Changes: Key features such as auto-discovery of hosts and import/export tools are now available. The telnet plugin was improved and the mail plugin was updated. The PEAR libraries were updated.

[May 6, 2008] check_logfiles

Perl plugin: check_logfiles is a plugin for Nagios which checks logfiles for defined patterns

check_logfiles 2.3.3 (Default)
Added: Sun, Mar 12th 2006 15:09 PDT (2 years, 1 month ago)
Tue, May 6th 2008 10:37 PDT (today)

check_logfiles is a plugin for Nagios which checks logfiles for defined patterns. It is capable of detecting logfile rotation. If you tell it how the rotated archives look, it will also examine these files. Unlike check_logfiles, traditional logfile plugins were not aware of the gap which could occur, so under some circumstances they ignored what had happened between their checks. A configuration file is used to specify where to search, what to search, and what to do if a matching line is found.

[May 5, 2008] Plash 1.19 by mseaborn

About: Plash is a sandbox for running GNU/Linux programs with minimum privileges. It is suitable for running both command line and GUI programs. It can dynamically grant Gtk-based GUI applications access rights to individual files that you want to open or edit. This happens transparently through the Open/Save file chooser dialog box, by replacing GtkFileChooserDialog. Plash virtualizes the file namespace and provides per-process/per-sandbox namespaces. It can grant processes read-only or read-write access to specific files and directories, mapped at any point in the filesystem namespace. It does not require modifications to the Linux kernel.

Changes: The build system for PlashGlibc has been changed to integrate better with glibc's normal build process. As a result, it is easier to build Plash on architectures other than i386, and this is the first release to support AMD-64. The forwarding of stdin/stdout/stderr that was introduced in the previous release caused a number of bugs that should now be fixed.

[May 5, 2008] Tcpreplay 3.3.0 (Stable) by Aaron Turner

About: Tcpreplay is a set of Unix tools which allows the editing and replaying of captured network traffic in pcap (tcpdump) format. It can be used to test a variety of passive and inline network devices, including IPS's, UTM's, routers, firewalls, and NIDS.

Changes: This release dramatically improves packet timing, introduces full fragroute support in tcprewrite, and improves Windows/Cygwin and FreeBSD support. Additionally, a number of smaller enhancements have been made and user discovered bugs have been resolved. All users are strongly encouraged to update.

[Apr 18, 2008] openQRM

Qlusters, maker of the open source systems management software OpenQRM, last week announced on that the most recent release of its OpenQRM systems management software would be the last from Qlusters.

[Apr 18, 2008] An Introduction to openQRM by Kris Buytaert

Imagine managing virtual machines and physical machines from the same console and creating pools of machines booted from identical images, one taking over from the other when needed. Imagine booting virtual nodes from the same remote iSCSI disk as physical nodes. Imagine having those tools integrated with Nagios and Webmin.

Remember the nightmare you ran into when having to build and deploy new kernels, or redeploy an image on different hardware? Stop worrying. Stop imagining. openQRM can do all of this.

openQRM, which just reached version 3.1, is an open source cluster resource management platform for physical and virtual data centers. In a previous life it was a proprietary project. Now it's open source and is succeeding in integrating different leading open source projects into one console. With a pluggable architecture, there is more to come. I've called it "cluster resource management," but it's really a platform to manage your infrastructure.

Whether you are deploying Xen, Qemu, VMWare, or even just physical machines, openQRM can help you manage your environment.

This article explains the different key concepts of openQRM

openQRM consists mainly of four components:

[Mar 18, 2008] Open (Source|System) Monitoring and Reporting Tool 1.2 by Ulrich Herbst

About: OpenSMART is a monitoring (and reporting) environment for servers and applications in a network. Its main features are a nice Web front end, monitored servers requiring only a Perl installation, XML configuration, and good documentation. It is easy to write more checks. Supported platforms are Linux, HP/UX, Solaris, AIX, *BSD, and Windows (only as a client).

Changes: New checks include mqconnect, which tests if a connection to a WebSphere MQ QueueManager is possible; mysqlconnect, which tests if a connection to a MySQL database is possible; readfile, which tests if a file in a (potentially network-based) filesystem is readable; and db2lck, which tests if there are critical lock situations on your DB2 database. Many bugs were fixed. A username and password can be specified. Recursive include functionality was added for osagent.conf.xml. Major performance improvements were made.

[Feb 26, 2008] dstat

dstat is a versatile replacement for vmstat, iostat, netstat, nfsstat, and ifstat. It includes various counters (in separate plugins) and allows you to select and view all of your system resources instantly; you can, for example, compare disk usage in combination with interrupts from your IDE controller, or compare the network bandwidth numbers directly with the disk throughput (in the same interval).

Release focus: Major feature enhancements

Various improvements were made to internal infrastructure. C plugins are now possible too. New topcpu, topmem, topio/tiobio, and topoom process plugins were added along with new innodb, mysql, and mysql5 application plugins. A new vmknic VMware plugin was added. Various fixes and improvements were made to plugins and output.

Dag Wieers [contact developer]

[Feb 20, 2008] collectd 4.3.0 by Florian Forster

About: collectd is a small and modular daemon which collects system information periodically and provides means to store the values. Included in the distribution are numerous plug-ins for collecting CPU, disk, and memory usage, network interface and DNS traffic, network latency, database statistics, and much more. Custom statistics can easily be added in a number of ways, including execution of arbitrary programs and plug-ins written in Perl. Advanced features include a powerful network code to collect statistics for entire setups and SNMP integration to query network equipment.

Changes: Simple threshold checking and notifications have been added to the daemon. The hostname can now be set to the FQDN automatically. Inclusion files have been made more flexible by allowing shell wildcards and including entire directories. The new libvirt plugin is able to collect some statistics about virtual guest systems without additional software on the guests themselves. The perl plugin has been improved a lot. It can now handle multiple threads and is now longer considered experimental. The csv plugin can now convert counter values to rates.

[Feb 1, 2008] SSH Factory 3.3

SSH can be controlled via tools like Expect too.

About: SSH Factory is a set of Java based client components for communicating with SSH and telnet servers. Including both SSH (Secure Shell) and telnet components, developers will appreciate the easy-to-use API making it possible to communicate with a remote server using just a few lines of code. In addition, SSH Factory includes a full-featured scripting API and easy to use scripting language. This allows developers to build and automate complex tasks with a minimum amount of effort.

Changes: The SshTask and TelnetTask classes were updated so that when the cancel() method is invoked, the underlying thread is stopped without delay. Timeout support was improved in SSH and telnet related classes. The com.jscape.inet.ipclientssh.SshTunneler class was added for use in creating local port forwarding SSH tunnels. Proxy support was improved so that proxy data is no longer applied to the entire JVM. HTTP proxy support was added.

[Jan 6, 2008] sysstat 8.0.4 by Sébastien Godard

About: The sysstat package contains the sar, sadf, iostat, mpstat, and pidstat commands for Linux.

The sar command collects and reports system activity information. The statistics reported by sar concern I/O transfer rates, paging activity, process-related activites, interrupts, network activity, memory and swap space utilization, CPU utilization, kernel activities, and TTY statistics, among others.

The sadf command may be used to display data collected by sar in various formats. The iostat command reports CPU statistics and I/O statistics for tty devices and disks.

The pidstat command reports statistics for Linux processes. The mpstat command reports global and per-processor statistics.

Changes: This version takes account of all memory zone types when calculating pgscank, pgscand, and pgsteal displayed by sar -B. An XML Schema was added. NLS was updated, adding Dutch, Brazilian Portuguese, Vietnamese, and Kirghiz translations.

[Nov 6, 2007] sarvant

sarvant analyzes files from the sysstat utility "sar" and produces graphs of the collected data using gnuplot. It supports user-defined data source collection, debugging, start and end times, interval counting, and output types (Postscript, PDF, and PNG). It's also capable of using gnuplot's graph smoothing capability to soften spiked line graphs. It can analyze performance data over both short and long periods of time.

[Nov 6, 2007] SYSSTAT tutorial

You will find here a tutorial describing a few use cases for some sysstat commands. The first section below concerns the sar and sadf commands. The second one concerns the pidstat command. Of course, you should really have a look at the manual pages to know all the features and how these commands can help you to monitor your system (follow the Documentation link above for that).

  1. Section 1: Using sar and sadf
  2. Section 2: Using pidstat

[Aug 20, 2007] OpenEsm - What is OpenESM

Zabbix-based monitoring solution. Has Tivoli event adapter written in Perl: OpenESM Universal Tivoli Enterprise Console Event Adapter

Right now, OpenESM has OpenESM for Monitoring v1.3. This release of the software is a combination of Zabbix, Apache, Simple Event Correlation and MySQL. Out of the box, we provide monitoring - warehousing of monitoring data - SLA reporting - correlation and notification. We offer the source code, but we also have a VMWARE based appliance.

[Aug 10, 2007] Argus - System and Network Monitoring Software

Another Perl-based package. It concentrates on TCP/IP based monitoring or remote hosts.

First, thanks for writing something that seems to be clean and easy to extend. I have been using Nagios @ work for some time and am anxious to replace it.

Richard F. Rebel -

Very nice -- we're just starting to test Argus for a small monitoring job, and so far it seems useful. Thanks for your contribution to the open source community. p>

Andre van Eyssen -

thanks great tool!! p

Sorin Esanu -

I am really happy with your soft, it is probably one of the best i have never found!
I own a hosting and this tool has been really cool for my business :)

Raul Mate Galan -

Argus works excellently. We use it to log data about all traffic through our router so that we can produce bandwidth usage statistics for customers.

Geoff Powell -

[Aug 2, 2007] Conky - a light weight system monitor for Ubuntu Linux Systems

Ubuntu Geek

Conky is an advanced, highly configurable system monitor for X based on torsmo. Conky is an powerful desktop app that posts system monitoring info onto the root window. It is hard to set up properly (has unlisted dependencies, special command line compile options, and requires a mod to xorg.conf to stop it from flickering, and the apt-get version doesnt work properly). Most people can't get it working right, but its an AWESOME app if it can be set up right done.

[Jul 30, 2007] Monitoring Debian Servers Using Monit -- Debian Admin

Looks like dead wood: C-based application.
monit is a utility for managing and monitoring, processes, files, directories and devices on a UNIX system. Monit conducts automatic maintenance and repair and can execute meaningful causal actions in error situations.

Monit Features

* Daemon mode - poll programs at a specified interval
* Monitoring modes - active, passive or manual
* Start, stop and restart of programs
* Group and manage groups of programs
* Process dependency definition
* Logging to syslog or own logfile
* Configuration - comprehensive controlfile
* Runtime and TCP/IP port checking (tcp and udp)
* SSL support for port checking
* Unix domain socket checking
* Process status and process timeout
* Process cpu usage
* Process memory usage
* Process zombie check
* Check the systems load average
* Check a file or directory timestamp
* Alert, stop or restart a process based on its characteristics
* MD5 checksum for programs started and stopped by monit
* Alert notification for program timeout, restart, checksum, stop resource and timestamp error
* Flexible and customizable email alert messages
* Protocol verification. HTTP, FTP, SMTP, POP, IMAP, NNTP, SSH, DWP,LDAPv2 and LDAPv3
* An http interface with optional SSL support to make monit accessible from a webbrowser

Install Monit in Debian

#apt-get install monit

This will complete the installation with all the required software.

Configuring Monit

Default configuration file located at /etc/monit/monitrc you need to edit this file to configure your options

Sample Configuration file as follows and uncomment all the following options

## Start monit in background (run as daemon) and check the services at 2-minute
## intervals.
set daemon 120

## Set syslog logging with the 'daemon' facility. If the FACILITY option is
## omited, monit will use 'user' facility by default. You can specify the
## path to the file for monit native logging.
set logfile syslog facility log_daemon

## Set list of mailservers for alert delivery. Multiple servers may be
## specified using comma separator. By default monit uses port 25 - it is
## possible to override it with the PORT option.
set mailserver localhost # primary mailserver

## Monit by default uses the following alert mail format:

From: monit@$HOST # sender
Subject: monit alert - $EVENT $SERVICE # subject


Date: $DATE
Action: $ACTION
Host: $HOST # body
Description: $DESCRIPTION

Your faithful,

## You can override the alert message format or its parts such as subject
## or sender using the MAIL-FORMAT statement. Macros such as $DATE, etc.
## are expanded on runtime. For example to override the sender:
set mail-format { from: }

## Monit has an embedded webserver, which can be used to view the
## configuration, actual services parameters or manage the services using the
## web interface.
set httpd port 2812 and
use address localhost # only accept connection from localhost
allow localhost # allow localhost to connect to the server and
allow admin:monit # require user 'admin' with password 'monit'

# Monitoring the apache2 web services.
# It will check process apache2 with given pid file.
# If process name or pidfile path is wrong then monit will
# give the error of failed. tough apache2 is running.
check process apache2 with pidfile /var/run/

#Below is actions taken by monit when service got stuck.
start program = "/etc/init.d/apache2 start"
stop program = "/etc/init.d/apache2 stop"
# Admin will notify by mail if below of the condition satisfied.
if cpu is greater than 60% for 2 cycles then alert
if cpu > 80% for 5 cycles then restart
if totalmem > 200.0 MB for 5 cycles then restart
if children > 250 then restart
if loadavg(5min) greater than 10 for 8 cycles then stop
if 3 restarts within 5 cycles then timeout
group server

#Monitoring Mysql Service

check process mysql with pidfile /var/run/mysqld/
group database
start program = "/etc/init.d/mysql start"
stop program = "/etc/init.d/mysql stop"
if failed host port 3306 then restart
if 5 restarts within 5 cycles then timeout

#Monitoring ssh Service

check process sshd with pidfile /var/run/
start program "/etc/init.d/ssh start"
stop program "/etc/init.d/ssh stop"
if failed port 22 protocol ssh then restart
if 5 restarts within 5 cycles then timeout

You can also include other configuration files via include directives:

include /etc/monit/default.monitrc
include /etc/monit/mysql.monitrc

This is only sample configuration file. The configuration file is pretty self-explaining; if you are unsure about an option, take a look at the monit documentation

After configuring your monit file you can check the configuration file syntax using the following command

#monit -t

Once you don't have any syntax errors you need to enable this service by changing the file /etc/default/monit

# You must set this variable to for monit to start


# You must set this variable to for monit to start

Now you need to start the service using the following command

#/etc/init.d/monit start

Monit Web interface

Monit Web interface will run on the port number 2812.If you have any firewall in your network setup you need to enable this port.

Now point your browser to http://yourserverip:2812/ (make sure port 2812 isn't blocked by your firewall), log in with admin and monit.If you want a secure login you can use https check here

Monitoring Different Services

Here's some real-world configuration examples for monit. It can be helpful to look at the examples given here to see how a service is running, where it put its pidfile, how to call the start and stop methods for a service, etc. Check here for more examples.


Ortro is a Web-based system for scheduling and application monitoring. It allows you to run existing scripts on remote hosts in a secure way using ssh, create simple reports using queries from databases (in HTML, text, CSV, or XLS) and email them, and send notifications of job results using email, SMS, Tibco Rvd, Tivoli postemsg, or Jabber.

Release focus: Major feature enhancements

Changes: Support for i18n was added, and English and Italian languages are now available. More plugins were added, such as zfs scrub check, svc check, and zpool check for Solaris. Session check and tablespace check for Oracle and Check Uri were added. The mail, custom_query, ping, and www plugins were updated. There are bugfixes and improvements for the GUI such as the "add" button in the toolbar. The PEAR libraries were updated to the latest stable version.

Nagios offers open source option for network monitoring

"One of the big flaws of enterprise monitoring is monitoring without context."
Be wouldn't it be tough for IT managers sell higher-ups on the virtues on a open source monitoring tool? It might be worth the effort, said James Turnbull, author of Pro Nagio 2.0. Turnbull spoke recently with Assistant Editor MiMi Yeh about how Nagios is different from its counterparts in the commercial world and why IT shops should give it a chance.

What sets Nagios apart from other open source network monitoring tools like Big Brother, OpenNMS, OpenView and SysMon?

James Turnbull: I think there are three key reasons why Nagios is superior to many other products in this area -- ease of use, extensibility and community. Getting a Nagios server up and running generally only takes a few minutes. Nagios is also easily integrated and extended either by being able to receive data from other applications or sending data to reporting engines or other tools. Lastly, Nagios has excellent documentation backed up with a great community of users who are helpful, friendly and knowledgeable. All these factors make Nagios a good choice for enterprise management in small, medium and even large enterprises.

... ... ...

What tips, best practices and gotchas can you offer to sys admins working with Nagios?

Turnbull: I guess the best recommendation I can give is read the documentation. The other thing is to ask for help from the community -- don't be afraid to ask what you think are dumb questions on Wikis, Web sites, forums or mailing lists. Just remember the golden rule of asking questions on the Internet -- provide all the information you can and carefully explain what you want to know.

Are there workarounds to address the complaint that Nagios has no individual IP addresses for each host and service must be defined?

Turnbull: I think a lot of the 'automated' discovery tools are actually more of a hindrance than a help. One of the big flaws of enterprise monitoring is monitoring without context. It's all well and good to go out across the network and detect all your hosts and add them to the monitoring environment, but what do all these devices do?

You need to understand exactly what you are monitoring and why. When something you are monitoring fails, you not only know what that device is but what the implications of that failure are. Nagios is not a business context/business process tool. The fact that you have to think clearly about what you want to monitor and how means that you are more aware of your environment and the components that make up that environment.

Is there any advice you would give to users?

Turnbull: The key thing to say to new users is to try it out. All you need is a spare server and a few hours and you can configure and experiment with Nagios. Take a few problems areas you've had with monitoring and see if you can solve them with Nagios. I think you'll be pleasantly surprised.

[Jul 25, 2007] monit

Dead-wood C-based application. Looks like has some ad-hoc language for description of checks.

Samba (windows file/domain server)

Hint: For enhanced controllability of the service it is handy to split up the samba init file into two pieces, one for smbd (the file service) and one for nmbd (the name service).

 check process smbd with pidfile /opt/samba2.2/var/locks/
   group samba
   start program = "/etc/init.d/smbd start"
   stop  program = "/etc/init.d/smbd stop"
   if failed host port 139 type TCP  then restart
   if 5 restarts within 5 cycles then timeout
   depends on smbd_bin

 check file smbd_bin with path /opt/samba2.2/sbin/smbd
   group samba
   if failed checksum then unmonitor
   if failed permission 755 then unmonitor
   if failed uid root then unmonitor
   if failed gid root then unmonitor
 check process nmbd with pidfile /opt/samba2.2/var/locks/
   group samba
   start program = "/etc/init.d/nmbd start"
   stop  program = "/etc/init.d/nmbd stop"
   if failed host port 138 type UDP  then restart
   if failed host port 137 type UDP  then restart
   if 5 restarts within 5 cycles then timeout
   depends on nmbd_bin

 check file nmbd_bin with path /opt/samba2.2/sbin/nmbd
   group samba
   if failed checksum then unmonitor
   if failed permission 755 then unmonitor
   if failed uid root then unmonitor
   if failed gid root then unmonitor

[Jun 19, 2007] Simple System Thermometer (systher)

Systher is a small Perl tool that collects system information and presents it as an XML document. The information is collected using standard Unix tools, such as netstat, uptime and lsof.

Systher can be used in many ways:

In order to make the obtained information readable for humans, Systher is equipped with an XSLT processing stylesheet to convert the XML information into HTML. That way, the information can be made visible in a browser.

[May 29, 2007] ZABBIX 1.4 (Stable) by by Alexei Vladishev -

About: ZABBIX is an enterprise-class distributed monitoring solution for networks and applications. Native high-performance ZABBIX agents allow monitoring of performance and availability data of all operating systems.

Changes: This release introduces support of centralized distributed monitoring, flexible auto-discovery, advanced Web monitoring, and much more.

[Apr 11, 2007] Unix Server Monitoring Scripts

Collection of a dozen of scripts. Some in Perl.

Unix Server Monitoring Scripts is a suite that will monitor Unix disk space, Web servers via HTTP, and the availability of SMTP servers via SMTP. It will save a history of these events to diagnose and pinpoint problems. It also sends a message via email if a Web server is down or if disk usage exceeds one of two thresholds. Each script acts independently of the others.

Main Scripts

Support Scripts

Tarball of all files in the Suite

[Apr 11, 2007] Open source network monitoring -- An open alternative Andrew R. Hickey

Zenoss is built on the python-based Zope Application server. Zenoss uses NetSNMP to collect data via SNMP, data is stored in MySQL, and data is logged by RRDtool.
Feb 08, 2007 |

Network monitoring and management applications can be costly and cumbersome, but recently a host of companies have sprung forth offering an open source alternative to IBM Tivoli, HP OpenView, CA and BMC -- and they're starting to gain traction.

The major commercial software vendors, known as the "big four," are frequently criticized for their high cost and complexity and, in some cases, are chided for being too robust -- having too many features that some enterprise users may find completely unnecessary.

Many of the open source alternatives are quick to admit that their solutions aren't for everyone, but they bring to the table arguments in their favor that networking pros can't ignore, namely low cost and ease of use.

"Open source is a huge phenomenon," Zenoss CEO and co-founder Bill Karpovich said. "It's providing an alternative for end users."

Zenoss makes Core, an integrated IT monitoring product that lets IT admins manage the status and health of their infrastructure through a single Web-based console. The latest version of the free, open source software features automated change tracking, automatic remediation, and expanded reports and export capabilities.

According to Karpovich, Zenoss software monitors complete networks, servers, applications, services, power and related environments. The biggest benefit, however, is its openness, meaning that users can tailor it to their systems any way they choose.

"It's complete enterprise IT monitoring," Karpovich said. "It's network monitoring and management, application management, and server management all through a single pane of glass."

Flexibility included

Some users have said the Tivolis and OpenViews of the world are hard to customize and very inflexible, but open source alternatives are often the opposite. They are known for their flexibility. "You can use the product as you want," Karpovich said.

Nagios developer Ethan Galstad said flexibility is a major influence on enterprises looking to move ahead with an open source monitoring project. Nagios makes open source software that monitors network availability and the states of devices and services.

"You have as an end user much more influence on the future of the feature set," Galstad said, adding that through the open source community, end users can request a feature they want, discuss the pros and cons and, in many cases, implement that feature within a relatively short time.

And for things that Nagios and other open source monitoring tools don't do, end users can tie the tools in with other solutions to create the environment they want.

"There are a lot of hooks," Galstad said.

[Apr 10, 2007] Configure OpenNMS Step By Step by saad khan

2006-07-28 |

OpenNMS is an opensource enterprise network management tool. It helps network administrators to monitor critical services on remote machines and collects the information of remote nodes by using SNMP. OpenNMS has a very active community, where you can register yourself to discuss your problems. Normally OpenNMS installation and configuration takes time, but I have tried to cover the installation and configuration part in a few steps.

OpenNMS provides the following features.

ICMP Auto Discovery
SNMP Capability Checking
ICMP polling for interface availability
HTTP, SMTP, DNS and FTP polling for service availability
Fully distributed client server architecture
JAVA Real-time console to allow moment-to-moment status of the network
XML using XSL style web access and reporting
Business View partitioning of the network using policies and rules
Graphical rule builder to allow graphical drag/drag relationships to be built
JAVA configuration panels
Redundant and overlapping pollers and master station
Repeating and One-time calendaring for scheduled downtime

The source code of OpenNMS is available for download from A production release (stable) and a development release (unstable), I have used 1.2.7 stable release in this howto.

I have tested this configuration with Redhat/Fedora, Suse, Slackware, Debian and it works smoothly. I am assuming that readers already have Linux background. You can use the following configuration for other distributions too. Before you start OpenNMS installation, you need to install following packages:

[Apr 10, 2007] Network Monitoring with Zabbix by ovis

March 10, 2006 |

Zabbix has the capability to monitor just a about any event on your network from network traffic to how many papers are left in your printer. It produces really cool grahps.

In this howto we install software that has an agent and a server side. The goal is to end up with a setup that has a nice web interface that you can show off to your boss ;)
It's a great open source tool that lets you know what's out there.
This howto will not go into setting up the network but I might rewrite it one day so I really like your input on this. Much of what is covered here is in the online documentation however if you are like me new to this all this might be of some help to you.

[Apr 9, 2007] GroundWork Monitor Open Source

GroundWork unifies leading open source projects like Nagios, Ganglia, RRDtool, Nmap, Sendpage, and MySQL, and offers a wide range of support for operating systems (Linux, Unix, Windows, and others), applications, and networked devices for complete enterprise-class monitoring.

Release focus: Major feature enhancements
New features include:

- Incorporation of RRD data: enhancing GWMOS with other tools that use RRDs should be much easier
- Performance graphing of historical data using the RRD data
- UI improvements to give you access to information of interest, with fewer clicks, in a cleaner interface

In addition to the source tarball downloadable fr the SVN repository is also accessible.

GroundWork Monitor Open Source (GWMOS) 5.1-01 Bootable ISO now available: this image should boot cleanly in any ix86-compatible computer, or boot the image in a virtualized environment such as VMWare or Xen. It's a simple, super fast mechanism for evaluating GWMOS while setting up temporary monitoring quickly at any site: just pop in the CD and boot!

The GroundWork Monitor Open Source Bootable ISO automatically boots, logs you in, launches Firefox, and starts up GroundWork with all the associated services such as apache, Nagios(R), MySQL, and RRDtool, etc. all loaded and running.

The ISO is set up with included profiles to monitor the host system and two internet sites out-of-the-box, giving you some immediate data to observe without setting up any additional devices. When booted from a physical CD, everything runs in the computer's RAM: the hard drive of the host computer is never touched.

Have fun, and keep us posted on your experience at

[Mar 12, 2007] Zabbix State-of-the-art network monitoring

I have used BigBrother and Nagios for a long time to troubleshoot network problems, and I was happy with them -- until Zabbix came along. Zabbix is an enterprise-class open source distributed monitoring solution for servers, network services, and network devices. It's easier to use and provides more functionality than Nagios or BigBrother.

Zabbix is a server-agent type of monitoring software, meaning you have a Zabbix server where all gathered data is collected, and a Zabbix agent running on each host.

All Zabbix data, including configuration and performance data, is stored in a relational database -- MySQL, PostgreSQL, or Oracle -- on the server.

Zabbix server can run on all Unix/Linux distributions, and Zabbix agents are available for Linux, Unix (AIX, HP-UX, Mac OS X, Solaris, FreeBSD), Netware, Windows, and network devices running SNMP v1, v2, and v3.

[Mar 05, 2007] OpenNMS bests OpenView and Tivoli while Ipswitch spreads the FUD by Dave Rosenberg

I strongly doubt that this is FUD. Looks like pretty realistic assessment of the situation.
March 05, 2007 | InfoWorld

OpenNMS bests OpenView and Tivoli while Ipswitch spreads the FUD
Filed under: Infrastructure

Chalk up another victory for OSS over proprietary. OpenNMS beat out both OpenView and Tivoli in the SearchNetworking Product Leadership Awards. I wonder if that will shut up this ridiculous FUD from Ipswitch "Don't trust your network to open source."

I let Travis take the shots at this foolishness...wake up, Ipswitch, you are late to the FUD train. Javier...anything from you?

Myth #1 - Open Source is free - According to Greene, downloading open source from the Internet and then customizing to your environment "often is not a good use of your time." Greene adds that he'd "rather pay an upfront fee for software that does what I need and doesn't have any high-cost labor attached to it."
Hmmm ... what about the fact that proprietary software (and *especially* network monitoring and management products) are often tremendously difficult to install / configure / maintain ongoing? How is being held hostage to a vendor for support / installation / configuration preferable? And how is being tied to a predetermined feature set preferable to having the ability to customize an open source approach solution to meet your environment's needs?
Myth #2 - Bug fixes are faster and less expensive in an open source environment - the second "myth" that Greene exposes around open source is the notion that there are thousands of developers sitting at home contributing labor for free. Greene suggests that most of the contributing vendors are typically employed by large vendors ? and that "even when those individuals generously offer their time for free, can you really afford to wait for one to agree with you on the urgency of action if your network is down."
Hmmm it's better NOT to have access to the source code when you have a bug? It's preferable to have to open a help ticket with the vendor and wait in line? It's better NOT to have general visibility into the bugs and issues being reported by the members of the user community?
Myth #3 - Your IT staff can buy a 'raw' tool and shape it to their needs - Greene's last point is that the industry has moved away from the "classic open source" model where folks download raw open source and customize to their needs - and to more of a commercial open source model, where organizations are leveraging open source distribution as a way to sell services.



Not a very valid comparison as there are many products out there that do a far better job the HP OpenView or OpenNMS or Tivoli.

If you are an OSS type supporter in terms of your business model it would make finacial sense to use OpenNMS but in terms of best of breed this OSS product does not come close. Some might argue that using OSS software will cost you more as there are very few people who know how to use it and I mean use it, not some Linux script kiddy but someone with enterprise management experience. These days its not about implementation its about integration and the comparison should be about how nice does it play with the rest of my environment.

I don't see EMC SMARTS in the comparison list.

I am all for OSS software as long as it is not chosen as the cheapest option but rather as the best of breed option. As for NMS commercial software, I use it day in and day out and would like to see a more open model in terms of functionality and development.

Take a leaf out of SUN book, Open Solaris has proven to be a good business model for a commercial company and the benefits will be seen for years to come.

Posted by: James at March 8, 2007 04:34 AM

[Mar 1, 2007] Network and IT management platforms 2007 Products of the Year


The network is the central nervous system of the modern enterprise -- complex and indispensable. Keeping tabs on how that enterprise is functioning requires a sophisticated "big picture" management system that can successfully integrate with other network and IT products. Unfortunately, many products in this category are just too expensive for any but the largest companies (with the most generous IT budgets) to afford.

Enter OpenNMS, the gold medal winner in our network and IT management platforms category. The open source enterprise-grade network management system was designed as a replacement for more expensive commercial products such as IBM Tivoli and HP OpenView. It periodically checks that services are available, isolates problems, collects performance information, and helps resolve outages. And it's free.

In our Product Leadership survey, readers praised OpenNMS for being easy to customize, easy to integrate and -- of course -- free. These attributes are all characteristic of any open source product. Because of its open source nature, OpenNMS has a community of developers contributing to its code. The code is open for anyone to view or adapt to suit individual needs.

Consequently, users can customize OpenNMS in ways that are limited only by their abilities and imagination -- not by licensing restraints. One reader said, "It is an open source product, so we can customize it easily." With traditional proprietary products, it may be difficult to find one piece of software that can manage the network effectively for every enterprise, but OpenNMS was designed to allow users to add management features over time. Its intentional compatibility with other open source (and proprietary) products provides seamless integration, requiring less piecemeal coding to fit things together.

Users of OpenNMS can also take advantage of the user community accessible through the OpenNMS Web site for answers to questions and help in troubleshooting problems. While one survey respondent remarked that "open source is advancing slowly to address some of the manageability issues," members of the OpenNMS mailing list are quick to answer any request with a friendly, knowledgeable response. For companies whose IT personnel are not afraid of an unconventional approach, the open source community provides support that is just as reliable as that of a commercial vendor -- and in many cases, more helpful.

But OpenNMS is not a "you get what you pay for" product, either. Readers said it "works great" and "significantly helped our network's bandwidth and packet management and controlled 'rogue' clients." Others found that it "works fine for a small business network" and is an "outstanding option." Even those whose experience was less positive found that any challenges were surmountable, such as the reader who said, "Since it's free, it was worth the effort."

Unix Monitoring Scripts by Damir Delija

Sys Admin

It is impossible to do systems administration without monitoring and alerting tools. Basically, these tools are scripts, and writing such monitoring scripts is an ancient part of systems administration that's often full of dangerous mistakes and misconceptions.

The traditional way of putting systems together is very stochastic and erratic, and that same method is often followed when developing monitoring tools. It is really rare to find a system that's been properly planned and designed from the start. The usual approach when something goes wrong is just to patch the immediate problem. Often, there are strange results from people making mistakes when they're in a hurry and under pressure.

Monitoring scripts are traditionally fired from root cron and send results by email. These emails can accumulate over time, flooding people with strange mails, creating problems on the monitored system, and causing other unexpected situations. Such scenarios are often unavoidable, because few enterprises can afford better measures than firefighting. In this article, I will mention a few tips that can be helpful when developing monitoring scripts, and I will provide three sample scripts.

What is a Unix Monitoring Script?

A monitoring tool or script is part of system management and to be really efficient must be part of an enterprise-wide effort, not a standalone tool. Its purpose is to detect problems and send alerts or, rarely, to try to correct the problem. Basically, a monitoring/alerting tool consists of four different parts:

  1. Configuration -- Defines the environment and does initializations, sets the defaults, etc.
  2. Sensor -- Collects data from the system or fetches pre-stored data.
  3. Conditions -- Decides whether events are fired.
  4. Actions -- Takes action if events are fired.

If these elements are simply bundled into a script without thinking, the script will be ineffective and un-adaptable. Good tools also include an abstraction layer added to simplify things later, when modifications are done.

To begin, we have to set some values, do some sanity checks, and even determine whether monitoring is allowed. In some situations, it is good to stop monitoring through the control file to avoid false notifications, during maintenance for example. This is all done in the configuration part of the script.

The script collects values from the system -- from monitored processes or the environment. This data collecting is done by the sensor part. This data can be the output of an external command or can be fetched from previously stored values, such as the current df output or previously stored df values (see Listing 1).

The conditions part of the script defines the events that are monitored. Each condition detects whether an event has happened and whether this is the start or the end of the event (arming or rearming). This process can compare current values to predefined limits or to stored values, if we are interested in rates instead of absolute values. Events can also be based on composite or calculated values, such as "Average idle from sar for the last 5 minutes is less than 10%" (see Listing 2).

Results at this level are logical values usually presented as some kind of empty/not-empty string, to be easily manipulated in later usage. The key is to have some point in the code where the clear status of the event is defined, so branching can be done simply and easily.

Actions consist of specific code that is executed in the context of a detected event, such as storing new values, sending traps, sending email, or performing some other automatically triggered action. It is good to put these into functions or separate scripts, since you can have similar actions for many events. Usually we want to send email to someone or send a trap. It is almost always the same code in all scripts, so keeping it separate is a good idea.

It is important to add some state support. We are not just interested in detecting limit violations; if that were the case, we would be flooded with messages. Detecting state changes can reduce unwanted messaging. When we define an event in which we are interested, we actually want to know when the event happened and when it ended -- that is, when the monitored values passed limits and when they returned. We are not interested in full-time notification that the event is still occurring. Thus, we need to know the change of the event state and value of the monitored variable.

State support is not necessary if there is some kind of console that can correlate notifications. In the simplest implementations, like a plain monitoring script, avoiding message flooding directly in the script itself is useful.

Each event must have a unique name and severity level. Usually, three levels of severity are enough, but sometimes five levels are used. It is best to start with a simple model such as:

Info -- Just information that something has happened
Warning -- Warning of possible dangerous situation
Fatal -- Critical situation

IBM Redbooks



Damir Delija has been a Unix system engineer since 1991. He received a Ph.D. in Electrical Engineering in 1998. His primary job is systems administration, education, and other system-related activities.

Automating UNIX Security Monitoring by Robert Geiger and John Schweitzer

Sys Admin

All of the scripts listed in this article are meant to be run from cron on a regular basis -- daily or hourly, depending on the routine in question -- with the output going to either email or to the systems administrator's pager. However, none of the things described in this article are foolproof. UNIX security mechanisms are only relevant if the root account has not been compromised. For example, scripts run through crontab can be easily disabled or modified if the attacker has attained root access, and most log files can be manipulated to cover tracks if the intruder has control over the root account.

[Feb 23, 2007] Re [SAGE] Mon vs BB vs

I tested out OpenNMS but found Nagios to be easier to get running, plus OpenNMS was very linux centric last I checked. Which is annoying since it looks like it's just a java application, no reason it couldn't be made to run elsewhere.

Anyway, as far as I can tell Nagios does everything OpenNMS does and more. As a network monitoring tool it's been great, I have it polling all of our SNMP enabled devices and receiving traps. With the host and service dependencies it becomes easier to see if the cause of an application failure is software, hardware, or network based.

That being said I would still love to play with OpenNMS if anyone has a way to get it to work under FreeBSD.
On Thursday 10 October 2002 04:52 pm, Alan Horn wrote:
> On 10 Oct 2002, Stephen L Johnson wrote:
> >If your are mainly monitoring networks, network monitoring tools are
> >better. The non-commercials tools, that I have looked at are OpenNMS and
> >Naigos (NetSaint). These tools are designed monitor network mainly.
> >Systems monitoring can be added as well.
> Nagios is primarily for monitoring network _services_ in it's default
> install (via the nagios plugins you get with the tool). Not for monitoring
> network devices (although it'll do that too). I just wanted to clarify
> that since I read this as 'nagios for monitoring cisco kit etc...' By
> network services I mean stuff like DNS, webservers, smtp, imap, etc... All
> the services that you probably want to monitor first of all when you set
> out to do thia.
> Adding systems monitoring with nagios is very nice indeed, using the NRPE
> (Nagios Remote Plugin Executor) module, you can run whatever arbitrary
> code you desire on your system, and return results back to the monitor. I
> have it monitoring diskspace on critical fileservers, health of some
> custom applications etc...
> I've used nagios, nocol, and big brother (many many moons ago.. it's
> evolved since I used it). Nagios most recently. Nagios takes a bit of work
> to setup due to its flexibility, but I've found it to be the best for my
> needs in both a single and multi-site situation (we have branch offices
> located around the world via VPN which need to be monitored).
> And the knowledge of network topology is great too !
> Hope this helps.
> Cheers,
> Al

[Feb 23, 2007] Re: Starting

David Nolan
Fri, 08 Sep 2006 05:49:55 -0700

On 9/3/06, Toddy Prawiraharjo <toddyp@...> wrote:
> Hello all,
> I am looking for alternative to Nagios (or should i stick with it? need
> opinions pls), and saw this Mon.

The choice between Mon and other OSS monitoring systems like Nagios, Big Brother or any of the others is very much dependent upon your needs.

My best summary of Mon is that its monitoring for sysadmins. Its not pretty, its not designed for management, its designed to allow a sysadmin to automate the performance monitoring that might otherwise be done ad-hoc or with cron jobs. It doesn't trivially provide the typical statistics gathering that many bean-counters are looking for, but its extensible and scalable in amazing ways. (See recent posts on this list about one company deploying a network of 2400 mon servers and 1200 locations, and my mon site which runs 500K monitoring tests a day, some of those on hostgroups with hundreds of hosts.)

> Btw, i need some auto-monitoring tools to monitor basic unix and windows > based services, such as nfs, sendmail, smb, httpd, ftp, diskspace, etc.

> I love perl so much, but then its been long time since it's been updated. Is it still around and supported?

If you love perl, Mon may be perfect for you, because if there is a feature you need you can always send us a patch. :)

Its definitely still around and supported. (I just posted a link to a mon 1.2.0 release candidate.) There hasn't been a lot of updates to the system in the last couple of years, but that's in part because the system is pretty stable as-is. There are certainly some big-picture changes we would like to do, but none of the current developers have had pressing reasons to work on the system. Personally, most of my original patches were based on CMU's needs when we did our Mon deployment, and since that time no major internal effort has been spent on extending the system. A review process of our monitoring systems is just starting now and that may result in either more programmer time being allocated to Mon or CMU might move away from Mon to some other system. (Obviously I'd be unhappy with that result, but I would continue to work with Mon both personally and in my consulting work.)

> Any good reference on the web interface? (the one from the site, is dead).

I believe the most commonly used interface is mon.cgi, maintained by Ryan Clark, available at

An older version of mon.cgi is included in the mon distribution.

> And most importantly, where to
> start? (any good documentation as starting point on how to use this Mon)

Start by reading the documentation, looking at the sample config file, and experimentation. A small installation can be setup in a matter of minutes. Once you've done a proof-of-concept install you can decide if Mon is right for you.


[Feb 23, 2007] [BBLISA] GPL system monitoring tools (alternatives to nagios)

Nov 27, 2006

I'm looking for suggestions for any GPL/opensource system monitoring tools that folks can recommend.

FYI we've been using Nagios for about 6 months now with mixed results. While it works, we've had to do an awful lot of customization and writing our own checks (mostly application-level stuff for our proprietary software).

I think we would be a lot happier with something simpler and more flexible than Nagios. Right now it's a choice between further hacking of Nagios vs. "roll our own" (the latter, I think, will be much more maintainable over the long run). But of course I'm looking to avoid reinventing the wheel as much as possible.

Any feedback or pointers are much appreciated.

thanks, JB


Re: [BBLISA] GPL system monitoring tools? (alternatives to nagios)

Jason Qualkenbush
Tue, 28 Nov 2006 06:35:56 -0800

I don't know about that. Nagios is really a roll your solution. All it really does is manage the polling intervals between checks. Just about everything else is something most people are going to write custom to their environments.

Just make sure you limit the active checks to simple things like ping, url, and some port checking. The system health checks (like disks, cpu usage, application checks) are really best done on the host itself. Just run a cron (or whatever the windows equivalent is) job that checks the system and submits the results to the nagios server via a passive check.

What customizations are you doing? The config files? What exactly is Nagios failing to do?

Re: [BBLISA] GPL system monitoring tools? (alternatives to nagios)

John P. Rouillard

Tue, 28 Nov 2006 12:48:17 -0800

In message <[EMAIL PROTECTED]>,
"Scott Nixon" writes:
>We have been looking at OpenNMS( It is developed full time
>by the OpenNMS Group( It was designed from the ground up to
>be an Enterprise class monitoring solution. If your interested, I'd
>suggest listening to this podcast with the *manager* of OpenNMS Tarus
>Balog (

I talked with Mr. Balog at the 2004 LISA IIRC. The big thing that makes opennms a non-starter for me was the inability to create dependencies between services. It's a pain to do in nagios but it's there and that is a critical tools for enterprise level operations. A fast perusal of the OpenNMS docs doesn't show that feature.

Compared to nagios the OpenNMS docs seem weak.

Also at the time all service monitors had to be written in java. I think there were plans to make a shell connector that would allow you to run any program and feed it's output back to OpenNMS. That means all the nagios plugins could be used with a suitable shell wrapper.

OpenNMS had a much nicer web interface and better access control IIRC. But at the time I don't think you could schedule downtime in the web interface. Alo I just looked at the demo and didn't see it (but that may be because it's a demo).

On the nice side, having multiple operational problem levels (5/6 IIRC) rather then nagios's 3: ok, warning, and critical was something I wished Nagios had.

Also the ability to annotate the events with more info than nagios allows was a win, but something similar could be done in nagios.

I liked it it just didn't provide the higher level functionality that we needed.

John Rouillard
My employers don't acknowledge my existence much less my opinions.Feb 20

[Feb 20, 2007] Nagios Network Monitoring

Nagios is frankly not very good, but it's better than most of the alternatives in my opinion. After all, you could spend buckets of cash on HP OpenView or Tivoli and still be faced with the same amount of work to customize it into a useful state....

Among the free alternatives, in my experience Big Brother is too unstable to trust, which makes me loath to buy a license as required for a commercial use.

Mon is quite good at monitoring and alerting, but it has all the same problems as Nagios plus a lack of sexy web GUI. I also don't like the way it handles service restoration alerts or blocking outages (dependencies) or multiple concurrent outages.

[Feb 06, 2007] Poor Man's Tech Alternatives to Nagios

For an easy way to get started with Nagios, try GroundWork Monitor Open Source: it unifies Nagios with lots of other open source IT tools and is much easier to set up than vanilla Nagios.

[Jan 4, 2007] Hyperic HQ 3.0.0 Beta 1 (3.x) by John Mark

Java and JavaScript written, licensed under GPL

About: Hyperic HQ is a distributed infrastructure management system whose architecture assures scalability, while keeping the solution easy to deploy. HQ's design is meant to deliver on the promise of a single integrated management portal capable of managing unlimited types of technologies in environments that range from small business IT departments to the operations groups of today's largest financial and industrial organizations.

Changes: This release features significant new functionality, including Operations Dashboard, a­ central view for real-time, general health of the entire infrastructure managed.

More powerful alerting is provided with alert escalation, alert acknowledgment, and RSS actions.

Event tracking and correlation provides historical and real-time information from any log resource, configuration file, or security module that can be correlated with availability, utilization, and performance.

[Dec 3, 2006] DeleGate

The idea of using gateway that provides encryption and all other "high-level" features for communicating with the server is attractive for monitoring.

About: DeleGate is a multi-purpose application level gateway or proxy server that mediates communication of various protocols, applying cache and conversion for mediated data, controlling access from clients, and routing toward servers. It translates protocols between clients and servers, converting between IPv4 and IPv6, applying SSL (TLS) to arbitrary protocols, merging several servers into a single server view with aliasing and filtering. It can be used as a simple origin server for some protocols (HTTP, FTP, and NNTP).

Changes: This version supports "implanted configuration parameters" in the executable file of DeleGate to restrict who can execute the executable and which functions of it are available, or to tailor the executable adapting to the environment in which it is used.

[Sep 20, 2006] Lightweight Conky is a system monitor powerhouse

Conky is a lightweight system monitor that provides essential information in an easy-to-understand, highly customizable interface. The software is a fork of TORSMO, which is no longer maintained. Conky monitors your CPU usage, running processes, memory, and swap usage, and other system information, and displays the information as text or as a graph.

Debian and Fedora users can use apt-get and yum respectively to install Conky. A source tarball is also available.

[Aug 28, 2006] Product Open Source Network & Systems Monitoring

Python-based. Product used by Mercy Hospital of Baltimore and Cablevision of New York. Funding $4.8 millions in August 2006. Low cost alternative to monsters enterprize applications, affordable only to large companies.

Zenoss is an IT infrastructure monitoring product that allows you to monitor your entire infrastructure within a single, integrated software application.

Key features include:

[Aug 21, 2006] ZABBIX Monitoring system installation in Debian with Screenshots

ZABBIX is a 24×7 monitoring solution without high cost.

ZABBIX is software that monitors numerous parameters of a network and the health and integrity of servers. ZABBIX uses a flexible notification mechanism that allows users to configure e-mail based alerts for virtually any event. This allows a fast reaction to server problems. ZABBIX offers excellent reporting and data visualization features based on the stored data. This makes ZABBIX ideal for capacity planning.

ZABBIX supports both polling and trapping. All ZABBIX reports and statistics, as well as configuration parameters are accessed through a web-based front end. A web-based front end ensures that the status of your network and the health of your servers can be assessed from any location. Properly configured, ZABBIX can play an important role in monitoring IT infrastructure. This is equally true for small organizations with a few servers and for large companies with a multitude of servers.

[Jun 20, 2006] Zenoss--Open Source Systems Management for SMBs


Eyeing systems management as the next big market to "go open source," Zenoss, Inc. is now trying to give mid-sized customers another alternative beyond the two main choices available so far: massive suites from the "Big Four" giants or a mishmash of specialized point solutions.

"We're focusing on the IT infrastructures of the 'mid-market.' These aren't 'Mom and Pops.' They're organizations with about 50 to 5,000 employees, or $50 million to $500 million in revenues," said Bill Karpovich, CEO of the software firm

Earlier in May, the Zenoss, Inc.-sponsored Zenoss Project joined hands with Webmin, the Emu Software-sponsored NetDirector, and several other open source projects to form the Open Management Consortium (OMC).

Right now, a lot of mid-sized companies and not-for-profits are still struggling to string together effective systems management approaches with specialized tools such as WhatsUp Gold and Ipswitch's software.

Historically, organizations in this bracket have been largely ignored by the "Big Four"--IBM, Hewlett-Packard, BMC, and Computer Associates, according to Karpovich.

"These companies have concentrated mainly on the Fortune 500, and their suites are very heavy and expensive," Karpovich charged, during an interview with LinuxPlanet.

But Karpovich anticipates that the Big Four could start to widen their scope quite soon, spurred by analysts' projections of stellar growth in the systems management space.

Mercy Hospital, a $400 million health care facility in Baltimore, is one medium-sized organization that has already turned down overtures from a Big Four vendor in favor of Zenoss.

"We'd been using a hodgepodge of tools from different vendors," according to Jim Stalder, the hospital's CIO, who cited SolarWinds and Cisco as a couple of examples.

But over the past few years, Mercy's IT mainly Windows-based infrastructure has expanded precipitously, Stalder maintained, in another interview.

Mercy chose Zenoss over a Big Four alternative mostly on the basis of cost, according to the hospital's CIO.

Zenoss doesn't charge for its software, which is offered under GPL licensing. Karpovich said. Instead, its revenue model is built around professional services--including customization, integration, staff training, and best practices consulting -- and support fees.

Alternatively, organizations can "use their own resources" or hire other OMC partners or other third-party consultants for professional services.

Zenoss users can also customize the software code for integration or other purposes.

"We used to have 100 servers, but now we have close to 200," Stalder said. "Mercy has done a good job of embracing (advancements in) health care IT. But sometimes your staffing budget doesn't grow as linearly as your infrastructure. And it got difficult to keep tabs on all these servers with fewer (IT) people on hand."

Also according to Karpovich, many organizations--particularly in the midrange tier--don't need all of the features offered in the IBM/HP/BMC/CA suites.

As inspiration behind Xenoss' effort, he pointed to the success of JBoss in the open source application server market, EnterpriseDB and Postgres among databases, and SugarCRM in the CRM arena.

"All of these markets have been moving to open source one by one. And they've all been turned on their heads by really strong vendors. We expect that systems management will be the next place where open source has a big impact, and we want to lead the charge," he told LinuxPlanet.

"We want to do something that's somewhere 'in the middle,' offering a very rich solution with enterprise-grade monitoring at a price mid-sized organizations can afford."

Karpovich maintained that, to step beyond "first-generation" open source tools, Zenoss replaces the traditional ASCII interface with a template-enabled GUI geared to easy systems configurability.

The system also provides autodiscovery and many other features also found in pricier systems.

Zenoss revolves around four key modules: inventory configuration; availability monitoring; performance monitoring; and event management.

The inventory configuration module contains its own autopopulated database. "This is not just an ASCII. We've built a database that understands relationships. For a server, for example, this means, 'What are patches?' There's a real industry trend around ITIL, and we are doing that. A lot of commercial vendors are also talking about CDMD, and we'll be pushing that back toward open source," according to Karpovich.

The available monitoring in Zenoss is designed to assure that applications "are 'up' and responding," he told LinuxPlanet.

The performance monitoring module makes it possible to track metrics such as disk space over time, and to generate user configurable threshold-based alerts.

The event management capability, on the other hand, offers a centralized area for consolidating events. "Every Windows server has event logging. But we let you bring together events (from multiple servers) and prioritize them," according to the Zenoss CEO.

For his part, Mercy Hospital's Stalder is mainly quite satisfied with Zenoss. "So far, so good. This represented a major savings opportunity for us, and we wouldn't have used a fraction of the features in a (Big Four) suite," he told LinuxPlanet.

"We went live (with Zenoss) in early April, and got it up and running very quickly. We've been able to turn off several other tools, as a result. And Zenoss has shown us several (IT infrastructure) problems we weren't even aware existed," he said.

For example, in rolling up the logs of its SQL Server databases, Mercy found out that several databases weren't being backed up properly.

The hospital did need to turn on the SNMP in its servers to get autoduscovery to work. "But this was only because we'd never turned it on before," he added.

Yet Stalder did point to a couple of features on his future wish list for Zenoss. He'd like the software to include notification escalation--"so that if Joe doesn't respond to his pager, you can reach him somewhere else"--as well as a "synthetic transaction generator," to "emulate how the application appears to a user logging on."

But Karpovich readily admits that there's room for more functionality in the Zenoss environment. In fact, that's one of the main reasons behind the decision to join other open source ISVs in founding the OMC, he suggested.

"With our partners, we're building an ecosystem around products and systems integration," he told LinuxPlanet. "We haven't yet decided yet where all of us will fit. But we want to provide (customers) with all that they need for systems management. In areas where we don't have standards for integration, we can collaborate on integration."

Other founding members of the Open Management Consortium include Nagios, an open source project sponsored by Ayamon; openQRM, sponsored by Qlusters; and openSIMS, sponsored by Symtiog.

The consortium also plans to create a "systems integration repository around best practives for sharing instrumentation," Karpovich said.

"The business model is kind of like that of SugarCRM. Partners will build their own businesses selling services. Then, if one of their customers wants Zenoss, for example, the partner will get a commission," he elaborated.

But Zenoss will also do its best to avoid the bloatware phenomenon associated with the Big Four suites, according to Karpovich.

"One of the things people don't like about the 'Big Four' is that if they don't buy capabilities now, it will cost them more later. With Zenoss, you're not under that kind of pressure," the CEO told LinuxPlanet.

[Jun 20, 2006] Server Monitoring With BixData HowtoForge - Linux Howtos and Tutorials

[June 20, 2006] BixData Cluster and Systems Management Try our free full featured Community edition. It supports up to 30 machines.

BixData addresses the major areas of management and monitoring.

System Management

Application monitoring

Network monitoring

Performance monitoring

Hardware monitoring

[Jun 8, 2006] Host Grapher

Host Grapher is a very simple collection of Perl scripts that provide graphical display of CPU, memory, process, disk, and network information for a system.

There are clients for Windows, Linux, FreeBSD, SunOS, AIX and Tru64. No socket will be opened on the client, nor will SNMP be used for obtaining the data.

[May 9, 2006] Open source vendors create sys man consortium - Computer Business Review

Six of the leading open source systems management vendors are to announce that they have created a new consortium to further the adoption of open source systems management software and develop open standards.

The Open Management Consortium has been founded by a group of open source systems management and monitoring players, including

[Apr 20, 2006] Server Monitoring With munin And monit HowtoForge by Falko Timme

04/20/2006 | Linux Howtos and Tutorials

In this article I will describe how to monitor your server with munin and monit. munin produces nifty little graphics about nearly every aspect of your server (load average, memory usage, CPU usage, MySQL throughput, eth0 traffic, etc.) without much configuration, whereas monit checks the availability of services like Apache, MySQL, Postfix and takes the appropriate action such as a restart if it finds a service is not behaving as expected. The combination of the two gives you full monitoring: graphics that lets you recognize current or upcoming problems (like "We need a bigger server soon, our load average is increasing rapidly."), and a watchdog that ensures the availability of the monitored services.

[Apr 17, 2006]

Among the network-management start-ups that received second rounds of funding:
Company Product/description Latest funding
Cittio WatchTower – enterprise monitoring and management software. March 2006 – $8 million from JK&B Capital, Hummer Winblad Venture Partners.
GroundWork Open Source Solutions GroundWork Monitor Professional – IT monitoring tool based on open source software. March 2005 – $8.5 million from Mayfield, Canaan Partners.
LogLogic LogLogic 3 – appliance that aggregates and stores log data. September 2004 – $13 million from Sequoia Capital, Telesoft Partners and Worldview Technology Partners.
Splunk Splunk – downloadable software to search logs generated by hardware and software. January 2006 – $10 million from JK&B Capital

[Apr 10, 2006] moodss

Moodss is a modular monitoring application, which supports operating systems (Linux, UNIX, Windows, etc.), databases (MySQL, Oracle, PostgreSQL, DB2, ODBC, etc.), networking (SNMP, Apache, etc.), and any device or process for which a module can be developed (in Tcl, Python, Perl, Java, and C). An intuitive GUI with full drag'n'drop support allows the construction of dashboards with graphs, pie charts, etc., while the thresholds functionality includes emails and user defined scripts. Monitored data can be archived in an SQL database by both the GUI and the companion daemon, so that complete history over time can be made available from Web pages or common spreadsheet software. It can even be used for future behavior prediction or capacity planning, from the included predictor tool, based on powerful statistical methods and artificial neural networks.

[Apr 10, 2006] Browse project tree - Topic System Monitoring

Big Sister is Perl-based, SNMP-aware monitoring program consisting of a Web-based server and a monitoring agent. It runs under various Unixes and Windows.

[Apr 05, 2006] Splunk Base brings IT troubleshooting to the IT masses

To better understand Splunk Base, look no further than the online encyclopedia Wikipedia.

Like Wikipedia, Splunk Base provides a global repository of user-regulated information, but the similarities end there. Splunk Inc. will formally unveil Splunk Base this week at the LinuxWorld 2006 Conference for all to see its free-of-charge community stockpiled error messages and troubleshooting tips for IT professionals from IT professionals -- for any system they can get their hands on.

At the head of this community effort is Splunk's chief community Splunker Patrick McGovern, who picked up much of his community experience while working with developers when he managed the open source project repository

Now at Splunk, McGovern manages Splunk Base, a global wiki of IT events that grants IT workers access to information about specific events recorded by any application, system or device.

[Mar 24, 2006] Project details for monit

Monit is a utility for managing and monitoring processes, files, directories, and devices on a Unix system. It conducts automatic maintenance and repair and can execute meaningful causal actions in error situations. It can be used to monitor files, directories, and devices for changes, such as timestamps changes, checksum changes, or size changes. It is controlled via an easy to configure control file based on a free-format, token-oriented syntax. It logs to syslog or to its own log file and notifies users about error conditions via customizable alert messages. It can perform various TCP/IP network checks, protocol checks, and can utilize SSL for such checks. It provides an HTTP(S) interface for access.

[Dec 8, 2005] Zabbix by Alexei Vladishev - Not exactly Perl (written in PHP+C) but still an interesting product...

About: Zabbix is software that monitors your servers and applications. Polling and trapping techniques are both supported. It has a simple, yet very flexible notification mechanism, and a Web interface that allows quick and easy administration. It can be used for logging, monitoring, capacity planning, availability and performance measurement, and providing the latest information to a helpdesk.

Changes: This release introduces automatic refresh of unsupported items, support for SNMP Counter64, new naming schema for ZABBIX agent's parameters, more flexible user-defined parameters for UserParameters, double sided graphs, configurable refresh rate, and other enhancements.

["] user comment on ZABBIX
by LEM - Nov 17th 2004 05:07:23

Excellent _product_:
. easy to install and configurue
. easy to custom
. easy to use
. very good functional level (multiple maps, availability, trigger/alerts dependancies, SLA calculation)
. use very few ressources

I've been using ZABBIX to monitor about 500 éléments (servers, routers, switches...) in a heterogenous environment (windows, unices, snmp-aware equipements).

An excellent alternative to Nagios and MoM+Minautore.

["] Best network monitor I 've seen
by robertj - Feb 7th 2003 15:29:38

This is a GREAT project. Best monitor I've seen. Puts the Big Brother monitoring to shame.

[Dec 1, 2005] MoSSHe (MOnitoring with SSH Environment).

Python based simple, lightweight (both in size and system requirements) server monitoring package designed for secure and in-depth monitoring of a number of typical Internet systems

MoSSHe (MOnitoring with SSH Environment) is a simple, lightweight (both in size and system requirements) server monitoring package designed for secure and in-depth monitoring of a number of typical Internet systems.

It was developed to keep the impact on network and performance low, and to use a safe, encrypted connection for in-depth inspection of the system checked. It is not possible to remotely run (more or less arbitrary) commands via the monitoring system, nor is unsafe cleartext SNMP messaging necessary (yet possible). A read-only Web interface makes monitoring and status checks simple (and safe) for admins and helpdesk.

Checking scripts are included for remote services (DNS, HTTP, IMAP2, IMAP3, POP3, samba, SMTP, and SNMP) and local systems (disk space, load, CPU temperature, fan speed, free memory, print queue size and activity, processes, RAID status, and shells).

[May 25, 2005] SEC - open source and platform independent event correlation tool

SEC is an open source and platform independent event correlation tool that was designed to fill the gap between commercial event correlation systems and homegrown solutions that usually comprise a few simple shell scripts. SEC accepts input from regular files, named pipes, and standard input, and can thus be employed as an event correlator for any application that is able to write its output events to a file stream. The SEC configuration is stored in text files as rules, each rule specifying an event matching condition, an action list, and optionally a Boolean expression whose truth value decides whether the rule can be applied at a given moment.

Regular expressions, Perl subroutines, etc. are used for defining event matching conditions. SEC can produce output events by executing user-specified shell scripts or programs (e.g., snmptrap or mail), by writing messages to pipes or files, and by various other means.

[Dec 22, 2004] Building Linux Monitoring Portals with Open Source

Faced with an increasing number of deployed Linux servers and no budget for commercial monitoring tools, our company looked into open-source solutions for gathering performance and security information from our Unix environment. There are many open-source monitoring packages to choose from, including Big Sister and Nagios to name a few. Though some of these try to provide an all-in-one solution, I knew we would probably end up combining a few tools to obtain the metrics we were looking for. This article is meant to give a general overview of the steps in building a monitoring solution. Take a look at the demo here which is a scaled down model of our production monitoring portal.

Required Packages

Package Link

I started out with a base Red Hat ES 3.0 installation but any flavor of Linux will work. Depending on your distro, some of the above required packages might be already installed, particularly libpng, zlib and gd. You can check if any of these are installed by issuing the following from the command line;

rpm –qa | grep packagename

I selected MRTG (Multi-Router Traffic Grapher) for the base statistics engine. This tool is mainly used for tracking statistics on network devices but it can be easily modified to track performance metrics on your Unix or Windows servers. The instructions for installing MRTG on Unix can be found here. The gd, libpng and zlib packages are required to be compiled and installed before MRTG can be fired up. Even though you might have already installed them, if you try to compile MRTG with the default package installations, it will probably complain about various things including GD locations. For your sanity, you'll want to install these packages from scratch using the instructions from the MRTG website since they require specific "--" options when compiled. If you're feeling creative, you can also rebuild the SRPM's from source. Be sure to exclude these packages in the Up2date or Yum configuration files since when updates to these packages become available, the "update" application will overwrite your custom RPM's.

RRDTOOL is used as a backend database to store statistics gathered from MRTG. By default, MRTG stores data in text files which it gathers through SNMP. This method is fine for a few servers but when your environment starts growing, you'll need a faster method of reading and storing data. RDDTool (Round Robin Database) enables storage of server statistics into a compact database. Future versions of MRTG are going to use this format by default so you might as well start using it now.

Angelfire is great front-end tool for monitoring servers via ICMP and services running over TCP. This Perl program runs from CRON and generates a HTML table which contains the status of your devices. Color bars represent the status of the server. (Green=GOOD : Yellow=LATENCY >100ms : Red=UNREACHABLE).

For Apache, I used the default installation that comes with Red Hat. No need to install a fresh copy plus it will be easier to maintain for updates using RHN.

Proactive security checks are a mandatory part of system administration these days. Nessus is a great vulnerability scanner plus the HTML output options makes incorporating this into the portal very easy.

[Sep 10, 2004] TECH SUPPORT Impress the Boss with Cacti

September 2004 | Linux Magazine

When using Linux in a business environment, it's important to monitor resource utilization. System monitoring helps with capacity planning, alerts you to performance problems, and generally makes managers happy.

So, in this month's "Tech Support," let's install Cacti, a resource monitoring application that utilizes RRDtool as a back-end. RRDTool stores and displays time-series data, such as network bandwidth, machine-room temperature, and server load average. With Cacti and RRDtool, you can graph system performance in a way that will not only make it more useful, it'll also impress your pointy-haired boss.

Start with RRDtool. Written by Tobi Oetiker (of MRTG fame) and licensed under the GNU General Public License (GPL), you can download RRDtool from Build and install the software with:

$ ./configure; make
# make install; make site-perl-install

To ease upgrades, you should also link /usr/local/rrdtool to the /usr/local/rrdtool-version directory created by make install.

Now that you have RRDtool installed, you're ready to install Cacti. Cacti is a complete front-end to RRDtool (based on PHP and MySQL) that stores all of the information necessary to create and populate performance graphs. Cacti utilizes templates, supports multiple graphing hierarchies, and has its own user-based authentication system, which allows administrators to create users and assign them different permissions to the Cacti interface. Also licensed under the GPL, Cacti can be downloaded from

The first step to install Cacti is to unpack its tarball into a directory accessible via your web server. Next, create a MySQL database and user for Cacti (this article uses cacti as the database name). Optionally, you can also create a system account to run Cacti's cron jobs.

Once the Cacti database is created, import its contents by running mysql cacti < cacti.sql. Depending on your MySQL setup, you may need to supply a username and password for this step.

After you've imported the database, edit include/config.php and specify your Cacti MySQL database information. Also, if you plan to run Cacti as a user other than the one you're installing it as, set the appropriate permissions on Cacti's directories for graph/log generation. To do this, type chown cactiuser rra/ log/ in the Cacti directory.

You can now create the following cron job...

*/5 * * * * /path/to/php /path/to/www/cacti >
  /dev/null > 2&1

... replacing /path/to/php with the full pathname to your command-line PHP binary and /path/to/www/cacti with the web accessible directory you unpacked the Cacti tarball into.

Now, point your web browser to http://your-server/cacti/ and login with the default username and password of admin and admin. You must change the administrator password immediately. Then, make sure you carefully fill in all of the path variables on the next screen.

By default, Cacti only monitors a few items, such as load average, memory usage, and number of processes. While Cacti comes pre-configured with some additional data input methods and understands SNMP if you have it installed, its power lies in the fact that you can graph data created by an arbitrary script. You can find a list of contributed scripts at, but you can easily write a script for almost anything.

To create a new graph, click on the "Console" tab and create a data input method to tell Cacti how to call the script and what to expect from it. Next, create a data source to tell Cacti how and where the data is stored, and create a graph to tell Cacti how to display the data. Finally, add the new graph to the "Graph View" to see the results.

While Cacti is a very powerful program, many other applications also utilize the power of RRDtool, including Cricket, FlowScan, OpenNMS, and SmokePing. Cricket is a high performance, extremely flexible system for monitoring trends in time-series data. FlowScan analyzes and reports on Internet Protocol (IP) flow data exported by routers and produces graph images that provide a continuous, near real-time view of network border traffic. OpenNMS is an open source project dedicated to the creation of an enterprise grade network management platform. And SmokePing measures latency, latency distribution, and packet loss in your network.

You can find a comprehensive list of front-ends available for RRDtool at Using some of these RRDtool-based applications in your environment will not only make your life easier, it may even get you a raise!

[Sep 10, 2004] Spong -- systems and Network Monitoring


What is Spong?

Spong is a simple systems and network monitoring package. It does not compete with Tivoli, OpenView, UniCenter, or any other commercial packages. It is not SNMP based, it communcates via simple TCP based messages. It is written in Perl. It can currently run on every major Unix and Unix-like operating systems.


  • client based monitoring (CPU, disk, processes, logs, etc.)
  • monitoring of network services (smtp, http, ping, pop, dns, etc.)
  • grouping of hosts (routers, servers, workstations, PCs)
  • rules based messaging when problems occur
  • configurable on a host by host basis
  • results displayed via text or web based interface
  • history of problems
  • verbose information to help diagnosis problems
  • modular programs to makes it easy to add or replace check functions or features
  • Big Brother BBSERVER emulation to allow Big Brother Clients to be used

Sample Spong Setup

This is my development Spong setup on my home network. It is Spong version 2.7. There are a lot of new features that have been added since verson 2.6f. But if you click on the "Hosts" Link in the top frame, you will get a good feel of how Spong 2.6f looks and works.


Spong is free software issued released under the GNU General Public License or the Perl Artistic License. You may choice whichever license is appropriate for your usage.


Don't let the amount of documentation scare you, I still think spong is simple to setup and use.

Documentation for Spong is included with every release. For version 2.6f, the documentation is in HTML format located in the www/docs/ directory and is self contained (the links will still work if you move it), so you should be able to copy it to whatever location that you want. An online copy of the documentation is available here.

The documentation for Spong 2.7. is not complete. It is under going a complete rewrite into POD formation. This change will enable the documentation to converted into a multitude of different formats (i.e. HTML, man, text, etc.).

Release Notes / Changes

The CHANGE file for each release functions are the Release Notes and Change Log for each verion of Spong. The CHANGES file for Spong 2.6f is available here and the CHANGES file for Spong 2.7 is available here.

[Sep 10, 2004] Argus Monitoring System


Argus is a system and network monitoring application. It will monitor nearly anything you ask it to monitor (TCP + UDP applications, IP connectivity, SNMP OIDS, etc). It presents a clean, easy-to-view Web interface. It can send alerts numerous ways (such as via pager) and can automatically escalate if someone falls asleep.

[Apr 10, 2004] RRDutil

RRDutil is a a tool to collect statistics (typically every 5 minutes) from multiple servers, store the values in RRD databases (using RRDtool), and plot out pretty graphs to a Web server on demand. The graph types shown include CPU, memory, disk (space and I/O), Apache, MySQL queries and query types, email, Web hits, and more.

Recommended Links

Google matched content

Softpanorama Recommended

Top articles


Network Monitoring Tools by Les Cottrell

This far more comprehensive page that this one but with slightly different focus, although host monitoring and network monitoring now by-and-large overlap.

This is a list of tools used for Network (both LAN and WAN) Monitoring tools and where to find out more about them. The audience is mainly network administrators. You are welcome to provide links to this web page. Please do not make a copy of this web page and place it at your web site since it will quickly be out of date.

Network Discovery Tool Software

O'Reilly Network Top Five Open Source Packages for System Administrators Browse project tree - Topic System Monitoring

Argus Monitoring System

Argus is a system and network monitoring application. It will monitor nearly anything you ask it to monitor (TCP + UDP applications, IP connectivity, SNMP OIDS, etc). It presents a clean, easy-to-view Web interface. It can send alerts numerous ways (such as via pager) and can automatically escalate if someone falls asleep.

Big Sister Big Sister is an SNMP-aware monitoring program consisting of a Web-based server and a monitoring agent

Big Sister is an SNMP-aware monitoring program consisting of a Web-based server and a monitoring agent. It runs under various Unixes and Windows. Big Sister does for you:

  • monitor networked systems
  • provide a simple view on the current network status
  • notify you when your systems are becoming critical
  • generate a history of status changes
  • log and display a variety of system performance data
Sys Admin > Using Email to Perform UNIX System Monitoring and Control

SSH-based monitoring

[Sep 13, 2004] moodss Added: Fri, May 8th 1998 03:34 PDT ; Updated: Mon, 02:00 C, Perl, Python, TCL

Moodss is a modular monitoring application, which supports operating systems (Linux, UNIX, Windows, etc.), databases (MySQL, Oracle, PostgreSQL, DB2, ODBC, etc.), networking (SNMP, Apache, etc.), and any device or process for which a module can be developed (in Tcl, Python, Perl, Java, and C).

An intuitive GUI with full drag'n'drop support allows the construction of dashboards with graphs, pie charts, etc., while the thresholds functionality includes warning by emails and user defined scripts. Any part of the visible data can be archived in an SQL database by both the GUI and the companion daemon, so that complete history over time can be made available from Web pages, common spreadsheet software, etc.



MoSSHe (MOnitoring with SSH Environment) is a simple, lightweight (both in size and system requirements) server monitoring package designed for secure and in-depth monitoring of a number of typical Internet systems. Written in Python

MoSSHe (MOnitoring with SSH Environment) is a simple, lightweight (both in size and system requirements) server monitoring package designed for secure and in-depth monitoring of a number of typical Internet systems. It was developed to keep the impact on network and performance low, and to use a safe, encrypted connection for in-depth inspection of the system checked. It is not possible to remotely run (more or less arbitrary) commands via the monitoring system, nor is unsafe cleartext SNMP messaging necessary (yet possible). A read-only Web interface makes monitoring and status checks simple (and safe) for admins and helpdesk. Checking scripts are included for remote services (DNS, HTTP, IMAP2, IMAP3, POP3, samba, SMTP, and SNMP) and local systems (disk space, load, CPU temperature, fan speed, free memory, print queue size and activity, processes, RAID status, and shells).

Commercial Monitoring Tools



nPULSE is a Web-based network monitoring package for Unix-like operating systems. It can quickly monitor up to thousands of sites/devices at a time on multiple ports. nPULSE is written in Perl and comes with its own (SSL optional) Web server for extra security.

Sentinel System Monitor

Sentinel System Monitor is a plugin-based, extendable remote system monitoring utility that focuses on central management and flexibility while still being fully-featured. Stubs are used to allow remote monitoring of machines using probes. Monitoring can support multiple architectures because the monitoring probes are filed by a library process that hands out probes based on OS/arch/hostname. Execution of blocks can be triggered by either test failure or success.

It uses XML for configuration and OO Perl for most programming. Support for remote command execution via plugins allows reaction blocks to be created that can try and repair possible problems immediately, or just notify an administrator that there is a problem.

Open (SourceSystem) Monitoring and Reporting Tool

OpenSMART is a monitoring (and reporting) environment for servers and applications in a network. Its main features are a nice Web front end, monitored servers requiring only a Perl installation, XML configuration, and good documentation. It is easy to write more checks. Supported platforms are Linux, HP/UX, Solaris, *BSD, and Windows (only as a client).


InfoWatcher is a system and log monitoring program written in Perl. The major components of InfoWatcher are SLM and SSM. SLM is a log monitoring and filter daemon process which can monitor multiple logfiles simultaneously, and SSM is a system/process monitoring utility that monitors general system health, process status, disk usage, and others. Both programs are easily configurable and extensible.

Network And Service Monitoring System

Network and Service Monitoring System is a tool for assisting network administrators in managing and monitoring the activities of their network. It helps in getting the status information of critical processes running at any machine in the network.

It can be used to monitor the bandwidth usage of individual machines in the network. It also performs checks for IP-based network services like POP3, SMTP, NNTP, FTP, etc., and can give you the status of the DNS server. The system uses MySQL for storing the information, and the output is displayed via a Web interface.

Kane Secure Enterprise

( should
do everything you require, I also suggest you check out Andy's great IDS
site ( (that's another fiver you owe me, Andy

the best I can recommend is medusa DS9. it's configurable and makes machine secure. the computer with medusa using old bind (ver 8) and old sendmail (ver 8.10??) with no patches. it runs linux 2.2.5. machine was not rooted for nearly two years...
medusa homepage:

GMem 0.2

Gmem is a tool to monitor the memory usage of your system using GTK progress bars and uptime using the proc filesystem. It's configurable and user friendly.

Benson Distributed Monitoring System

The goal of the Benson Distributed Monitoring System project is to make a distributed monitoring system with the extensibility and flexibility of mod_perl. The end goal is for system administrators to be able to script up their own alerts and monitors into an extensible framework which hopefully lets them get sleep at night. The communication layer uses standard sockets, and the scripting language for the handlers is Perl. It includes command line utilities for sending, listing, and acknowledging traps, and starting up the benson system. There is also a Perl module interface to the benson network requests.

Network And Service Monitoring System

Network and Service Monitoring System is a tool for assisting network administrators in managing and monitoring the activities of their network. It helps in getting the status information of critical processes running at any machine in the network. It can be used to monitor the bandwidth usage of individual machines in the network. It also performs checks for IP-based network services like POP3, SMTP, NNTP, FTP, etc., and can give you the status of the DNS server. The system uses MySQL for storing the information, and the output is displayed via a Web interface.

Sreehari Nair [contact developer]


Monfarm is an alarm-enabled monitoring system for server farms. It produces dynamically updated HTML status pages showing the availability of servers. Alarms are generated if servers become unavailable.

Sentinel System Monitor

Sentinel System Monitor is a plugin-based, extendable remote system monitoring utility that focuses on central management and flexibility while still being fully-featured. Stubs are used to allow remote monitoring of machines using probes. Monitoring can support multiple architectures because the monitoring probes are filed by a library process that hands out probes based on OS/arch/hostname. Execution of blocks can be triggered by either test failure or success. It uses XML for configuration and OO Perl for most programming. Support for remote command execution via plugins allows reaction blocks to be created that can try and repair possible problems immediately, or just notify an administrator that there is a problem.

Open (Source|System) Monitoring and Reporting Tool
A monitoring tool with few dependencies, nice frontend, and easy extensibility.

Demarc PureSecure
An all-inclusive network monitoring client/server program and Snort frontend.

Percival Network Monitoring System

Framework for distributed system and network monitoring



Groupthink : Two Party System as Polyarchy : Corruption of Regulators : Bureaucracies : Understanding Micromanagers and Control Freaks : Toxic Managers :   Harvard Mafia : Diplomatic Communication : Surviving a Bad Performance Review : Insufficient Retirement Funds as Immanent Problem of Neoliberal Regime : PseudoScience : Who Rules America : Neoliberalism  : The Iron Law of Oligarchy : Libertarian Philosophy


War and Peace : Skeptical Finance : John Kenneth Galbraith :Talleyrand : Oscar Wilde : Otto Von Bismarck : Keynes : George Carlin : Skeptics : Propaganda  : SE quotes : Language Design and Programming Quotes : Random IT-related quotesSomerset Maugham : Marcus Aurelius : Kurt Vonnegut : Eric Hoffer : Winston Churchill : Napoleon Bonaparte : Ambrose BierceBernard Shaw : Mark Twain Quotes


Vol 25, No.12 (December, 2013) Rational Fools vs. Efficient Crooks The efficient markets hypothesis : Political Skeptic Bulletin, 2013 : Unemployment Bulletin, 2010 :  Vol 23, No.10 (October, 2011) An observation about corporate security departments : Slightly Skeptical Euromaydan Chronicles, June 2014 : Greenspan legacy bulletin, 2008 : Vol 25, No.10 (October, 2013) Cryptolocker Trojan (Win32/Crilock.A) : Vol 25, No.08 (August, 2013) Cloud providers as intelligence collection hubs : Financial Humor Bulletin, 2010 : Inequality Bulletin, 2009 : Financial Humor Bulletin, 2008 : Copyleft Problems Bulletin, 2004 : Financial Humor Bulletin, 2011 : Energy Bulletin, 2010 : Malware Protection Bulletin, 2010 : Vol 26, No.1 (January, 2013) Object-Oriented Cult : Political Skeptic Bulletin, 2011 : Vol 23, No.11 (November, 2011) Softpanorama classification of sysadmin horror stories : Vol 25, No.05 (May, 2013) Corporate bullshit as a communication method  : Vol 25, No.06 (June, 2013) A Note on the Relationship of Brooks Law and Conway Law


Fifty glorious years (1950-2000): the triumph of the US computer engineering : Donald Knuth : TAoCP and its Influence of Computer Science : Richard Stallman : Linus Torvalds  : Larry Wall  : John K. Ousterhout : CTSS : Multix OS Unix History : Unix shell history : VI editor : History of pipes concept : Solaris : MS DOSProgramming Languages History : PL/1 : Simula 67 : C : History of GCC developmentScripting Languages : Perl history   : OS History : Mail : DNS : SSH : CPU Instruction Sets : SPARC systems 1987-2006 : Norton Commander : Norton Utilities : Norton Ghost : Frontpage history : Malware Defense History : GNU Screen : OSS early history

Classic books:

The Peter Principle : Parkinson Law : 1984 : The Mythical Man-MonthHow to Solve It by George Polya : The Art of Computer Programming : The Elements of Programming Style : The Unix Hater’s Handbook : The Jargon file : The True Believer : Programming Pearls : The Good Soldier Svejk : The Power Elite

Most popular humor pages:

Manifest of the Softpanorama IT Slacker Society : Ten Commandments of the IT Slackers Society : Computer Humor Collection : BSD Logo Story : The Cuckoo's Egg : IT Slang : C++ Humor : ARE YOU A BBS ADDICT? : The Perl Purity Test : Object oriented programmers of all nations : Financial Humor : Financial Humor Bulletin, 2008 : Financial Humor Bulletin, 2010 : The Most Comprehensive Collection of Editor-related Humor : Programming Language Humor : Goldman Sachs related humor : Greenspan humor : C Humor : Scripting Humor : Real Programmers Humor : Web Humor : GPL-related Humor : OFM Humor : Politically Incorrect Humor : IDS Humor : "Linux Sucks" Humor : Russian Musical Humor : Best Russian Programmer Humor : Microsoft plans to buy Catholic Church : Richard Stallman Related Humor : Admin Humor : Perl-related Humor : Linus Torvalds Related humor : PseudoScience Related Humor : Networking Humor : Shell Humor : Financial Humor Bulletin, 2011 : Financial Humor Bulletin, 2012 : Financial Humor Bulletin, 2013 : Java Humor : Software Engineering Humor : Sun Solaris Related Humor : Education Humor : IBM Humor : Assembler-related Humor : VIM Humor : Computer Viruses Humor : Bright tomorrow is rescheduled to a day after tomorrow : Classic Computer Humor

The Last but not Least Technology is dominated by two types of people: those who understand what they do not manage and those who manage what they do not understand ~Archibald Putt. Ph.D

Copyright © 1996-2018 by Dr. Nikolai Bezroukov. was initially created as a service to the (now defunct) UN Sustainable Development Networking Programme (SDNP) in the author free time and without any remuneration. This document is an industrial compilation designed and created exclusively for educational use and is distributed under the Softpanorama Content License. Original materials copyright belong to respective owners. Quotes are made for educational purposes only in compliance with the fair use doctrine.

FAIR USE NOTICE This site contains copyrighted material the use of which has not always been specifically authorized by the copyright owner. We are making such material available to advance understanding of computer science, IT technology, economic, scientific, and social issues. We believe this constitutes a 'fair use' of any such copyrighted material as provided by section 107 of the US Copyright Law according to which such material can be distributed without profit exclusively for research and educational purposes.

This is a Spartan WHYFF (We Help You For Free) site written by people for whom English is not a native language. Grammar and spelling errors should be expected. The site contain some broken links as it develops like a living tree...

You can use PayPal to make a contribution, supporting development of this site and speed up access. In case is down you can use the at


The statements, views and opinions presented on this web page are those of the author (or referenced source) and are not endorsed by, nor do they necessarily reflect, the opinions of the author present and former employers, SDNP or any other organization the author may be associated with. We do not warrant the correctness of the information provided or its fitness for any purpose.

The site uses AdSense so you need to be aware of Google privacy policy. You you do not want to be tracked by Google please disable Javascript for this site. This site is perfectly usable without Javascript.

Last updated: February 07, 2019