|
Home | Switchboard | Unix Administration | Red Hat | TCP/IP Networks | Neoliberalism | Toxic Managers |
(slightly skeptical) Educational society promoting "Back to basics" movement against IT overcomplexity and bastardization of classic Unix |
News | See also | Books | Recommended Links | NSCA Daemon | Perl |
External command pipe | Adaptive monitoring | Using SSH with Nagios | Using M4 for configuration | NRPE Plugin | checks_by_ssh |
SSH for System Administrators | Rsync | History | Humor | Etc |
|
Nagios was formerly known as Netsaint. It is originated and developed by Ethan Galstad for more then a decade, starting in 1999 (see History). It looks like this talented author has no previous experience with commercial monitoring systems. As a result the system is somewhat idiosyncratic and has some interesting solutions for common problems.
Nagios is probably the most popular open source monitoring system in existence today, and is generally credited for popularizing the centralized polling model. Under the hood, Nagios is really just a special-purpose scheduling and notification engine. By itself, it can’t monitor anything. All it can do is schedule the execution of small programs called plug-ins each of which monitors specific objects and take actions based on their output.
Nagios plug-ins return one of four states:
0 for “OK,”
1 for “Warning,”
2 for “Critical,” and
3 for “Unknown.”
The Nagios daemon can be configured to react to these return codes, notifying administrators via email or SMS. It also displays the status of the servers on the Web console (of this part is installed), although currently those facilities are somewhat basic and not as flexible as, say, stock monitoring in Yahoo Finance (which actually is a pretty similar area, just different metrics ;-)
In addition to the return codes, each plug-ins can also return a line of text, which will be captured by the daemon, written to a log, and, optionally, displayed in the UI. If the daemon finds a pipe character in the text returned by a plug-in, the first part is treated normally, and the second part is treated as performance data.
Performance data doesn’t really mean anything to Nagios; it won’t, for example, enforce any rules on it or interpret it in any way. The key point is that Nagios can be configured to split the line in two parts and handle the post-pipe text differently than pre-pipe text.
Logically Nagios can be split into two parts:
Nagios architecture has certain (correctable) deficiencies:
In other words architecturally in Nagios apples (monitoring part) are mixed with oranges (notification part). Although nothing prevents running two instances of Nagios and putting this infrastructure in between.
|
Functionally Nagios in not much more them a simple daemon which implements probes scheduling. It is written in C . Generally it belongs to the class of agentless monitoring systems (like HP SiteScope), but functionality for using SSH and telnet is very basic and is an afterthought.
Initially it was designed to monitor just two things:
In a way Nagios is still the most suitable for this kind of mixed monitoring: network services/devices monitoring plus monitoring the single host on which it is running.
By default, the Nagios plugins run on the Nagios server, not on monitored resources (they are all scheduled to run on localhost). Later Nagios added so called the Nagios Remote Plug-in Executor (NRPE) but in terms of architecture and functionality, it is a hack that is not very competitive with well engineered agents like you can find, for example, in commercial agent-based monitoring like HPE OpenView although Openview agent is an example of an overkill and represents the other extreme -- it's way too complex and this unreliable).
Remote probes can generated messages which are be forwarded to "mothership". Local probes can generate alerts which can be displayed on the Web interface, and simultaneously send via SMTP mail to sysadmin(s) involved.
There is nothing like concept of access path to host or "host access method" (for example telnet, ssh, NFS). Facilities provided (check_by_ssh and NPRE) while adequate both architectured like hacks. For example, here is no way to associate host of host groups with the particular access method for probes in Nagios configuration. For problems in using Nagios in large enterprise environment see Deploying Nagios in a Large Enterprise Environment, at USENIX LISA '07
Nagios functionality is definitely tilted toward monitoring network services. For more application oriented monitoring system one can try OpenSMART by Ulrich Herbst but currently it is not supported, sliding into abandonware. OpenSmart 2.0 is dated September 2011.
Nagios impose very few limitation on the structure and communication of probes (plug-ins in Nagios localhost-centric terminology) . Probe can be any legitimate Unix executable written in any language (shell, Perl, C, etc).
After execution Nagios:
Numeric Value | Service Status | Status Description |
---|---|---|
0 | OK | The plugin was able to check the service and it appeared to be functioning properly |
1 | Warning | The plugin was able to check the service, but it appeared to be above some "warning" threshold or did not appear to be working properly |
2 | Critical | The plugin detected that either the service was not running or it was above some "critical" threshold |
3 | Unknown | Invalid command line arguments were supplied to the plugin or low-level failures internal to the plugin (such as unable to fork, or open a tcp socket) that prevent it from performing the specified operation. Higher-level errors (such as name resolution errors, socket timeouts, etc) are outside of the control of plugins and should generally NOT be reported as UNKNOWN states. |
All major messages used on Linux can provide return code and write to STDIN. That means that all of them are suitable for writing Nagios probes.
Communication with probes is via Nagios environment which consists of multiple macros. Those macros also can serve as a parameters in probe invocation, for example specified host.
A plugin's command line parameters must follow some specific requirements:
Invocation of plug-in can involve built-in marcors that are passes as parameters. Each Nagios macro can be viewed as an environment variable that is populated by Nagios and its value is inserted into the probe invocation string, for example
define command{ command_name check_ping command_line /usr/local/nagios/libexec/check_ping -H $HOSTADDRESS$ -w 100.0,90% -c 200.0,60% }
Here $HOSTADDRESS$ is a macro populated by Nagios. Actually, starting with Nagios 2.0, most macros have been made available as environment variables too.
There is no concept of "messages window" in Nagios and no operation on messages received from the probes is directly possible. Although one can integrate post-processing of alerts pretty easily as a separate stage by processing Nagios logfile.
The ability to use services like SSH and telnet for communication with the remote host are not built-in into the system you need iether use NRPE (see below), or you need to specify each such probe in configuration file. As each probe perform some small operation invocation them via ssh is possible only in the number of clients are less the, say, hundred.
In Nagios there no way to program operations for the next 24 hours on the client, all client are completely dependent on the mothership. There is no any buffering of messages if communication with mothership is broken. The increase overhead and less scalability.
The structure of the events is extremely primitive. There are just two predefined fields rules for interpretation of which are "hardwires" into Nagios:
Everything else needs to interpreted based on conventions you need to create for yourself.
Nagios concept of alerts is nothing more then an ability to notify contacts (via email, pager or other methods) when problem occurs. This is handled by so called communication modules. Some of them (for SMTP) are provided in the distribution.
The only interesting architectural feature of Nagios that I have found is the concept of adaptive monitoring. Adaptive monitoring allowed a Nagios configuration to be changed during runtime. Currently it is limited to the ability to change the interval between checks and times during which checks are scheduled to occur. This allows you to turn on/off checks at specific times according to conditions in your environment. Not very impressive capability, but a step in right direction as it allows to avoid the effect of "avalanche of alerts".
In general you should be able to suppress events that are dependent on particular higher level event, if higher level event occurred. For example if the router to the segment is down, it makes no sense to produce alerts for failed ssh service to hosts. They should be suppressed.
Nagios prepackaged plug-ins are of low quality (for example ping plug-in does not understands that any enterprise server has as a minimum two network interfaces -- main and management (ILO/DRAC) and by comparing results between then you can understand if this is a power outage or networking problem (both down), or OS crash/hardware malfunction problem (management interface is reachable, but the main interface is not.)
But on a primitive level they can monitors a pretty wide variety of system properties, including standard Unix systems performance metrics such as load average and free disk space on each or some partitions; the health of important services like HTTP and SMTP as well as host network availability and reachability.
There are some capabilities which allows to extend notification list if the problem persist beyond certain threshold.
In addition to detecting problems with the hosts network connectivity and important network services, Nagios also allows the system administrator to specify what should be done if particular state of server (or service) is detected. Typically action involves just an alert to be sent to a designated recipient via various communication mechanisms (such as email, Unix message, pager). But it is also possible to define an event handler: a program that is run when a problem is detected. Such programs can attempt to solve the problem encountered, and they can also proactively prevent some serious problems when they get triggered by warning conditions.
Available actions in the Nagios Host Information display
Item | Meaning |
Disable checks of this host | Stop monitoring this host for availability. |
Acknowledge this host problem | Respond to a current problem (discussed below). |
Disable notifications for this host | Don't send alerts if this host is unavailable. |
Delay next host notification | Delay the next alert for host unavailability. |
Schedule downtime for this host. Cancel scheduled downtime for this host | Define or cancel schedule downtime. During downtime, host unavailability is not considered a problem |
Disable notifications for all services on this host. Enable notifications for all services on this host. | Don't/do send alerts if a service on this host fails. |
Schedule an immediate check of all services on this host | Check all services as soon as possible (rather than waiting for their next scheduled time). |
Disable checks of all services on this host Enable checks of all services on this host |
Disable or enable checking service health on this host. |
Disable event handler for this host | Prevent the event handler from running when a problem is detected on this host. |
Disable flap detection for this host | Don't try to detect flaps (rapid up-down or on-off oscillations) on this host or its services. |
The second menu item allows you to acknowledge any current problem. Acknowledging simply means "I know about the problem, and it is being handled." Nagios marks the corresponding event as such, and future alerts are suppressed until the item returns to its normal state. This process also allows you to enter a comment explaining the situation, an action that is helpful when more than one administrator regularly examines the monitoring data.
If you don't like all of these table-oriented status displays, Nagios also has the capability to use graphical ones. See the Nagios Web site for example screen shots.
Here is the list of plugins included in EPEL distribution of Nagios. As you can see around a dozen of then are Perl scripts and can be adapted into your own plugins. There is also utils.pm modules that can be used in your plug-ins.
root@centos:/nagios/plugins # file * check_breeze: perl script check_by_ssh: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.18, stripped check_clamd: symbolic link to `check_tcp' check_cluster: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.18, stripped check_dhcp: setuid ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.18, stripped check_dig: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.18, stripped check_disk: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.18, stripped check_disk_smb: perl script check_dns: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.18, stripped check_dummy: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.18, stripped check_file_age: perl script check_flexlm: perl script check_fping: setuid ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.18, stripped check_ftp: symbolic link to `check_tcp' check_game: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.18, stripped check_hpjd: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.18, stripped check_http: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.18, stripped check_icmp: setuid ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.18, stripped check_ide_smart: setuid ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.18, stripped check_imap: symbolic link to `check_tcp' check_ircd: perl script check_jabber: symbolic link to `check_tcp' check_ldap: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.18, stripped check_ldaps: symbolic link to `check_ldap' check_load: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.18, stripped check_log: POSIX shell script text executable check_mailq: perl script check_mrtg: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.18, stripped check_mrtgtraf: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.18, stripped check_mysql: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.18, stripped check_mysql_query: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.18, stripped check_nagios: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.18, stripped check_nntp: symbolic link to `check_tcp' check_nntps: symbolic link to `check_tcp' check_nrpe: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.18, stripped check_nt: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.18, stripped check_ntp: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.18, stripped check_ntp_peer: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.18, stripped check_ntp.pl: perl script check_ntp_time: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.18, stripped check_nwstat: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.18, stripped check_oracle: POSIX shell script text executable check_overcr: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.18, stripped check_pgsql: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.18, stripped check_ping: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.18, stripped check_pop: symbolic link to `check_tcp' check_procs: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.18, stripped check_real: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.18, stripped check_rpc: perl script check_sensors: POSIX shell script text executable check_simap: symbolic link to `check_tcp' check_smtp: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.18, stripped check_snmp: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.18, stripped check_spop: symbolic link to `check_tcp' check_ssh: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.18, stripped check_ssmtp: symbolic link to `check_tcp' check_swap: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.18, stripped check_tcp: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.18, stripped check_time: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.18, stripped check_udp: symbolic link to `check_tcp' check_ups: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.18, stripped check_users: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.18, stripped check_wave: perl script eventhandlers: directory negate: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.18, shedule operation on the client for the next 24 hoursx86-64, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.18, stripped utils.pm: Perl5 module source text utils.sh: POSIX shell script text executable
Web interface is now part of the standard distribution. In recent version it was rewritten using PHP, which adds additional complexity, but is OK for sysadmin who are using/supporting LAMP. The information that Nagios collects is displayed in a set of automatically updated Web pages. Several CGIs are included in order to allow you to view the current and historical status via a Web browser.
You can view it on http://localhost/nagios or http://server_dns_name/nagios
Access is via accounts that need to be created. Initially only account nagiosadmin exits.
WAP interface is also provided to allow you to acknowledge problems and disable notifications from an internet-ready cellphone. The narrow column on the left of the display lists links to all of the possible Nagios web pages (the one for the current page has been highlighted in the illustration).
The Tactical Overview page shows summary or "general statistics" about the overall monitored infrastructure status like the number of hosts which are down, unreachable, etc. The display also indicates that number of services in "critical" status (probably indicating a failure), as well as other states. Each of the problem indicator displays also functions as a link to another Web page giving details about that particular item.
When you restart Nagios web interface loses PID and apache also needs to be restarted.
As with any popular software package Nagios can be Installed from RPMs on RHEL/CentOS/Suse. It is available from EPEL repository. On CentOs 6 you can follow steps recommended by Digital Ocean How To Install Nagios On CentOS 6 DigitalOcean. Here is slightly modified version derived from it:
Step 1 - Install Packages on Monitoring Serverrpm -Uvh http://dl.fedoraproject.org/pub/epel/6/x86_64/epel-release-6-8.noarch.rpm rpm -Uvh http://rpms.famillecollet.com/enterprise/remi-release-6.rpm yum -y install nagios nagios-plugins-all nagios-plugins-nrpe nrpe php httpd chkconfig httpd on && chkconfig nagios on service httpd start && service nagios startStep 2 - Set Admin Panel Passwordhtpasswd -c /etc/nagios/passwd nagiosadminMake sure to keep this username as "nagiosadmin" - otherwise you would have to change /etc/nagios/cgi.cfg and redefine authorized admin.
Step 3: try to login to Web interface
Now you can point your Web browser to Nagios server IP address http://Server_IP/nagios and login.
You will be prompted for password you set in Step 2:
for large number of servers the process of configuring Nagios is very verbose and corresponding files should generally be generated with macro generator like M4 or custom Perl scripts, or you will hate Nagios from the day one :-).
This is first of all important for generating set of host files -- Nagios needs one definition/file for each host. This is easy to do using Perl from /etc/hosts. In the most primitive way if can be something like:
Host files can be stored as one file per server (if there are not many of them, say, less then 100) or grouped in the server group. Good practice is to use a special directory /etc/nagios/servers for this purpose. But is you wish they can be arbitrary distributed between multiple files.
Services should me linked to servers only via groups.
lso certain directories can be designated as containing configuration files and thus you can put definition into separate file. Logically Nagios concatenates those file in the order they are listed in conf file and process the result as a single large file.
Default configuration does not include and monitored hosts other then localhost and the file that does it does not conform with the recommendations above and generally needs to be split into host part and services part.
Configuring Nagios hosts and services is a pretty time consuming and boring, but at the same time pretty educational as printing and reading provided configuration files gives you some additional information about capabilities of the system.
The typical way to configure Nagios is to create definition for one host in a group and then use macro generator of Perl to create definition for all other hosts. And then repeat the same process for each server group.
For services there is not such recipe and much depends on what services you want to monitor. Internet is your friend as for most services you can find various ready made configurations, which can serve as a starting point.
Nagios uses the half-dozen configuration files. Recent versions also can use configuration directories which is very convenient for hosts, as you can generate each host definition as a separate file. Which simplifies maintenance as you can compare directory listing with the hostile using simple script (in large datacenters and in case you use multiple VMs "zombi servers" are rea real problem ;-)
Among them
Nagios configuration files are typically stored in /etc/nagios. If you use EPEL RPMs for installation after installation you will have the following files
root@centos:/etc/nagios # ll total 92 -rw-rw-r-- 1 root root 12988 Nov 24 12:49 cgi.cfg drwxr-x--- 2 root nagios 4096 Nov 24 12:49 conf.d -rw-rw-r-- 1 root root 44662 Nov 24 12:49 nagios.cfg -rw-r--r-- 1 root root 12806 Aug 4 16:23 nrpe.cfg drwxr-x--- 2 root nagios 4096 Jan 23 18:35 objects -rw-r----- 1 root apache 26 Jan 23 18:36 passwd drwxr-x--- 2 root nagios 4096 Jan 23 18:35 private
Ownership is tricky and "everything is root:root" principle does not work as some directories are not world readable/executable. Those should be owed by root:nagios.
NOTE:
Nagios is flexible as what configuration file it should use. The set of such files is defined in cfg_file
directves
in the nagios.sfg
file. When Nagios is starting it read all file in the order they are listed in nagios.cfg
and merges then into single virtual file in the order. If an object is defined multiple times (for example in two different files) only
the last definition is used.
As you can see you have three configuration directories (conf.d, objects and private) and five files. Directory conf.d is initally empty. It is convinent to strore host groups definition and services definition in it.
But in no way this is required.
Directory private contains a single file resource.cfg which defines several macros (you can add your own, see below)
# Sets $USER1$ to be the path to the plugins $USER1$=/usr/lib64/nagios/plugins # Sets $USER2$ to be the path to event handlers #$USER2$=/usr/lib64/nagios/plugins/eventhandlers # Store some usernames and passwords (hidden from the CGIs) #$USER3$=someuser #$USER4$=somepassword
The directory objects is of special interest to us. Initially, after the installation from RPMs it contains nine files.
root@centos:/etc/nagios # cd objects root@centos:/nagios/objects # ll total 48 -rw-rw-r-- 1 root root 7684 Nov 24 12:49 commands.cfg -rw-rw-r-- 1 root root 2138 Nov 24 12:49 contacts.cfg -rw-rw-r-- 1 root root 5379 Nov 24 12:49 localhost.cfg -rw-rw-r-- 1 root root 3069 Nov 24 12:49 printer.cfg -rw-rw-r-- 1 root root 3252 Nov 24 12:49 switch.cfg -rw-rw-r-- 1 root root 10941 Nov 24 12:49 templates.cfg -rw-rw-r-- 1 root root 3178 Nov 24 12:49 timeperiods.cfg -rw-rw-r-- 1 root root 3991 Nov 24 12:49 windows.cfg
Initially we will be interested is three of them (contacts.cfg, commands.cfg, and templates.cfg
Most configuration files in Nagios, while verbose, are pretty logically structured. The amount of information specified can be cut using templates.
This configuration file contains directives that apply to the entire Nagios monitoring system. It can contain any number of cfg_file directives which are simple includes that permit factoring out common sets of configuration directives and storing them as an individual files.
The first part of the configuration file specifies various file locations, including the general log file, files holding service
check command and notification and event handler command definitions (checkcommands
and misccommands
). Other
cfg_file
directives are used by the administrator to specify the object definition files in use at that site (indicated
by the one in red). Locations for other types of files follow. The lock file holds the PID of the current Nagios process.
Two directives to pay attention are cfg_file and cfg_dir . Here is the list from the default installation:
root@centos:/etc/nagios # egrep '^cfg' nag*
cfg_file=/etc/nagios/objects/commands.cfg
cfg_file=/etc/nagios/objects/contacts.cfg
cfg_file=/etc/nagios/objects/timeperiods.cfg
cfg_file=/etc/nagios/objects/templates.cfg
cfg_file=/etc/nagios/objects/localhost.cfg
cfg_dir=/etc/nagios/conf.d
...
There are also multiple directories related to logging
root@centos:/etc/nagios # egrep 'log' nag* | egrep -v '^#'
log_file=/var/log/nagios/nagios.log
status_file=/var/log/nagios/status.dat
log_rotation_method=d
log_archive_path=/var/log/nagios/archives
use_syslog=1
log_notifications=1
log_service_retries=1
log_host_retries=1
log_event_handlers=1
log_initial_states=0
log_current_states=1
log_external_commands=1
log_passive_checks=1
These directives specify logging settings, including how often logs are rotated (here, daily), the archive directory for old files, whether to log significant problems to syslog as well, and whether to log individual event types.
root@centos:/etc/nagios # egrep '^nagios|^admin|^date' nagios.cfg | egrep -v '^#' nagios_user=nagios nagios_group=nagios date_format=us admin_email=nagios@localhost admin_pager=pagenagios@localhost
These lines specify various global settings, including the user/group as which the Nagios daemon runs, the output format for dates
(here US style is specified), and the administrator's email address. The final item sets the value of the $ADMINPAGER$
macro, which can be used in command definitions.
# Package-wide event handlers
enable_event_handlers=1
global_host_event_handler=global-event-command
global_service_event_handler=global-svc-command
Settings related to event handlers. You can optionally define a single event handler for all host failures and service failures in this file if appropriate. Commands are defined in an object configuration file.
# Concurrent checks and time-outs
max_concurrent_checks=0
service_check_timeout=60
host_check_timeout=30
event_handler_timeout=30
notification_timeout=30
...
These directives control the number of maximum checks that can be made at the same time (0 means an unlimited number), as well as time-outs for various types of commands (values in seconds).
# Retained status information
retain_state_information=1
retention_update_interval=60
use_retained_program_state=1
These lines tell Nagios to retain information about host and service status between sessions, saving the values every 60 seconds, and reloading them when the facility starts up.
# Passive service checks
accept_passive_service_checks=1
check_service_freshness=1
These directives enable "passive checks": status data produced by external commands which Nagios imports periodically.
# Save Nagios data for later use
process_performance_data=1
host_perfdata_command=process-host-perfdata
service_perfdata_command=process-service-perfdata
These directives allow you to save Nagios data externally for long term analysis or other purposes. The commands specified here must
be defined in some object configuration file. The simplest such command simply writes the command's output to an external file: e.g.,
echo $OUTPUT$ >>
file, but you can perform whatever action is appropriate (e.g., send the data to an RRDTool
or other database).
The bulk of Nagios configuration occurs in the set of file which we will call "object configuration files". These files define hosts and services to be monitored, how various status conditions should be interpreted, and what actions should be taken when they occur. These files are used to define the following items:
There are also several supplementary definitions:
The items in red will need to be defined for virtually every Nagios installation; the ones in black are optional. In the sample Nagios configuration provided with the package, each type of object is defined in a separate configuration file (named after the object type, excluding any spaces). However, you can arrange your definitions in any form that makes sense to you.
It is a good practice to define hosts in a special directory. I recommend to create a special directory, for example /etc/nagios/servers for those hosts and generate files into it automatically.
Fr example:
echo "cfg_dir=/etc/nagios/servers" >> /etc/nagios/nagios.cfg mkdir /etc/nagios/servers && chown root:nagios /etc/nagios/servers cd /etc/nagios/servers
Common configuration items for specific class of hosts (for example Linux hosts, HPC node, etc) typically are defined via templates: named sets of attributes and settings which later can be easily applied to any number of actual objects. Traditionally host templates are stored in a single file templates.cfg For example, here is a default template definition for hosts. It contain two templates relevant for linux servers generic-host and linux-server (which inherits generic-host):
# Generic host definition template - This is NOT a real host, just a template! define host{ name generic-host ; The name of this host template notifications_enabled 1 ; Host notifications are enabled event_handler_enabled 1 ; Host event handler is enabled flap_detection_enabled 1 ; Flap detection is enabled process_perf_data 1 ; Process performance data retain_status_information 1 ; Retain status information across program restarts retain_nonstatus_information 1 ; Retain non-status information across program restarts notification_period 24x7 ; Send host notifications at any time register 0 ; DONT REGISTER THIS DEFINITION - ITS NOT A REAL HOST, JUST A TEMPLATE! } # Linux host definition template - This is NOT a real host, just a template! define host{ name linux-server ; The name of this host template use generic-host ; This template inherits other values from the generic-host template check_period 24x7 ; By default, Linux hosts are checked round the clock check_interval 5 ; Actively check the host every 5 minutes retry_interval 1 ; Schedule host check retries at 1 minute intervals max_check_attempts 10 ; Check each Linux host 10 times (max) check_command check-host-alive ; Default command to check Linux hosts notification_period workhours ; Linux admins hate to be woken up, so we only notify during the day ; Note that the notification_period variable is being overridden from ; the value that is inherited from the generic-host template! notification_interval 120 ; Resend notifications every 2 hours notification_options d,u,r ; Only send notifications for specific host states contact_groups admins ; Notifications get sent to the admins by default register 0 ; DONT REGISTER THIS DEFINITION - ITS NOT A REAL HOST, JUST A TEMPLATE! }
This template defines a variety of host-monitoring settings (which are explained in the comments following the semicolons). Now we can define multiple hosts using this template via use directive ("use linux-server" line is the reference for template and can be understood an inclusion directive) :
define host{ ; Template on which to base host use
linux-server; Note the attribute is not "name" as above host_name n01 ; Longer description alias n01 ; IP address address 10.10.1.44 ; Overrides template value max_check_attempts 8 }
Other hosts may be defined in a similar way. Host definitions themselves can also be used as templates.
The simplest way to define hosts (at least initially) is to generate files from /etc/hosts file. Here is a Perl script that can be used as starting point:
#!/bin/perl # gen_nagios_hosts -- generates set of files (one per host) for nagios in /etc/nagious/servers directory # using lines extracted from the /etc/hosts file via grep or some other filetring method. # # Nikolai Bezroukov, May 16, 2016. Version 1.0 # Public domain software # # Usage # grep -v 'localhost|domain2' | gen_nagios_hosts # NOTE: grep should be used for exluding all lines that need not to be generated. You can also create a separate file edit it manually # and use invocation # # gen_nagious/hosts /tmp/relevant_hosts # # Additional filering also can be implemented inside the main loop (currently only comments line and localhost are deleted) # # Short name of the host in the /etc/hosts file should be the second one in each /etc/hosts host definition line. # IP_ADDRESS LONG_NAME SHORT_NAME ANYTHING_ELSE # # For example # 10.20.10.33 node01.company.com node01 # Note that for this host the file generated will be /etc/nagios/servers/node01.cfg $TARGET='/etc/nagios/servers'; unless ( -d $TARGET) { `mkdir -p $TARGET && chown root:nagios $TARGET && chmod 750 $TARGET`; }
# # Main loop # while(<>) { chomp($line=$_); next if (substr($line,0,1) eq '#'); # remove comments next if (index($line,'localhost')> -1); # always remove localhost ($ip,$long_name,$short_name)=split(/\s+/,$line); # split the line extracting short name and IP open(SYSOUT,">$TARGET/$short_name.cfg"); $output="define host {\n". "\tuse\t\tlinux-server\n". "\thost_name\t$short_name\n". "\talias\t\t$long_name\n". "\taddress\t\t$ip\n}\n"; print $output; print SYSOUT $output; close SYSOUT; } `chown root:root $TARGET/* && chmod 644 $TARGET/* `; # ensure correct permissions and mode independentlyof system settings
Once hosts have been defined, they should be placed into host groups. Group is a concept related to display of information about hosts in Web interface:
define hostgroup{
hostgroup_name linux
alias linux
contact_groups admins
members n01,n02,n03
}
This definition creates the host group named linux
, consisting of three hosts (which should be already defined via
define host directives). The contact_groups
attribute specifies who to send notifications to, and it is defined
elsewhere (as we'll see).
What is important that hostgroups can be created out of hostgroups defined earlier (nesting): host groups may include already defined hostgroups instead of individual members.
You can use as many host groups as you want to depending on the your monitoring needs. Hosts can be part of multiple host groups. For example, here are definition that can be used for a small HPc cluster:
################################################################################ define hostgroup{ hostgroup_name old alias z52-z56 members z52, z53, z54, z55, z56 } ################################################################################ define hostgroup{ hostgroup_name dell alias dell blades members b1, b2, b3, b3, b4, b5, b6, b7, b8, b9, b10, b11, b12, b13, b14, b15, b16 } ### HP blades. define hostgroup{ hostgroup_name hp alias New hp blades members b21, b22, b23, b24, b25, b26, b27, b28, b29, b30, b31, b32, b33, b34 } define hostgroup{ hostgroup_name z99 alias headnode members z99 } define hostgroup{ hostgroup_name all alias all servers hostgroup_members z99,old,dell,hp }
You can use similar arrangement for services that you use for hosts and define each service as a separate file in services directory.
Like with host definition of service can be two stage process. First you define a template for this service and then you create individual definitions using this template.
Here is an example of of two templates. The first template (generic) defines some settings, which can be applied to the majority of services. The second template, generic-SMTP, uses the first template as a starting point and adds some email replated customarization in order to create a generic SMTP monitoring service.
Of couse you should not demostrated too much zeal. Deep nesting is difficult to understand and debug. Two probably is the upper useful limit for template (if not one ;-)
define service{ ; Define defaults for all services
name generic
register 0
; Check service every 30 minutes
normal_check_interval 30
; Retry failing checks every 3 minutes, up to 5 times
retry_check_interval 3
max_check_attempts 5
event_handler_enabled 1
check_period 24x7
; Repeat notifications for failures every 2 hours
notification_interval 120
notification_period 6to22
; Notify contacts about critical failures/recoveries
notification_options c,r
notifications_enabled 1
contact_groups admins
}
define service{ ; Define the SMTP service
use generic
name generic-smtp
register 0
service_description Check SMTP
check_command check_smtp
event_handler eh_smtp
contact_groups mailadmins
}
Now you can define service using templates. Very similar with the host definitions, but you can (and should) use hostgroups instead
of individual names of the servers, if service exists on multiple hosts (which is typically the case for such services as ssh, telnet,
Postgress, etc).
define service{ ; Define services to be monitored
name headquarters_mail_health_check
use generic-SMTP
; Monitor SMTP for all hosts in this host group
host_groups headquarters_mailhosts
}
define service{ ; Define services to be monitored
name data_center1_mail_health_check
use generic-SMTP
; Monitor SMTP for all hosts in this host group
host_groups data_center1_mailhosts
}
As you can see those definition differ only in host_groups.
The default configuration provides the definition of a single contact called nagiosadmin and a single contact grup admins:
root@centos:/nagios/objects # egrep '^#' contacts.cfg define contact{ contact_name nagiosadmin ; Short name of user use generic-contact ; Inherit default values from generic-contact template (defined above) alias Nagios Admin ; Full name of user email nagios@localhost ; <<***** CHANGE THIS TO YOUR EMAIL ADDRESS ****** } define contactgroup{ contactgroup_name admins alias Nagios Administrators members nagiosadmin }
As you can see only one contact group is defines -- admins. This is the group that you should work with until you became more knowledgeable about Nagios. Just add to it additional people.
Time period definitions are quite simple. Here are the default definitions:
define timeperiod{ timeperiod_name 24x7 alias 24 Hours A Day, 7 Days A Week sunday 00:00-24:00 monday 00:00-24:00 tuesday 00:00-24:00 wednesday 00:00-24:00 thursday 00:00-24:00 friday 00:00-24:00 saturday 00:00-24:00 } define timeperiod{ timeperiod_name workhours alias Normal Work Hours monday 09:00-17:00 tuesday 09:00-17:00 wednesday 09:00-17:00 thursday 09:00-17:00 friday 09:00-17:00 } define timeperiod{ timeperiod_name none alias No Time Is A Good Time } define timeperiod{ name us-holidays timeperiod_name us-holidays alias U.S. Holidays january 1 00:00-00:00 ; New Years monday -1 may 00:00-00:00 ; Memorial Day (last Monday in May) july 4 00:00-00:00 ; Independence Day monday 1 september 00:00-00:00 ; Labor Day (first Monday in September) thursday 4 november 00:00-00:00 ; Thanksgiving (4th Thursday in November) december 25 00:00-00:00 ; Christmas } define timeperiod{ timeperiod_name 24x7_sans_holidays alias 24x7 Sans Holidays use us-holidays ; Get holiday exceptions from other timeperiod sunday 00:00-24:00 monday 00:00-24:00 tuesday 00:00-24:00 wednesday 00:00-24:00 thursday 00:00-24:00 friday 00:00-24:00 saturday 00:00-24:00 }
Only the applicable days need be included in the definition. The only definition that you might need to modify is holidays. You cal also add maintenance period definition.
The commands in NAGIOS are essentially macros that define command string executed for a given macro mane. For example, macro
name check_smtp
can be associated with the execution string $USER1$/check_smtp -H $HOSTADDRESS$
:
define command{
command_name check_smtp
command_line $USER1$/check_smtp -H $HOSTADDRESS$
}
This command runs the check_smtp
script stored in the directory defined in the macro $USER1$
(which should
be defined in the resource.cfg file--see below); this macro by convention always holds the path to the Nagios plug-ins directory.
The command is passed the option -H
, followed by the IP address of the host to be checked (the latter is expanded from
the built-in $HOSTADDRESS$
macro).
define command{
command_name fix_smtp
command_line /usr/local/nagios/eh/fix_mail $HOSTADDRESS$ $STATETYPE$
}
Here, we define the command named fix_smtp
. It specifies the full path to a program to run, passing two arguments: the
host's IP address and the value of the $STATETYPE$
macro. This item is set to HARD for critical failures and SOFT for warnings.You can determine the syntax for any plug-in by running it with the --help option. You can also (and should) extend Nagios by adding custom plug-ins of your own.
Here are the definitions of commands used for notifications (we've wrapped the command_line
setting for clarity):
define command{
command_name notify-by-email
command_line /usr/bin/printf "%b" "***** Nagios 1.0 *****\n\n
Notification Type: $NOTIFICATIONTYPE$\n\n
Service: $SERVICEDESC$\n
Host: $HOSTALIAS$\n
Address: $HOSTADDRESS$\n
State: $SERVICESTATE$\n\n
Date/Time: $DATETIME$\n\n
Additional Info:\n\n$OUTPUT$" |
/usr/bin/mail -s "** $NOTIFICATIONTYPE$ alert - $HOSTALIAS$/$SERVICEDESC$ is $SERVICESTATE$ **" $CONTACTEMAIL$
}
This command constructs a simple email message using the printf
command and built-in Nagios macros. It then sends the
message using the mail
command, specifying the recipient as the $CONTACTEMAIL$
macro. The latter contains
the value of the corresponding email
attribute for the host or service that is generating the alert.
Default contents of commands.cfg file that you get after the installation is as following:
root@centos:/nagios/objects # egrep -v '^#' commands.cfg | grep -v '^$' define command{ command_name notify-host-by-email command_line /usr/bin/printf "%b" "***** Nagios *****\n\nNotification Type: $NOTIFICATIONTYPE$\nHost: $HOSTNAME$\nState: $HOSTSTATE$\nAddress: $HOSTADDRESS$\nInfo: $HOSTOUTPUT$\n\nDate/Time: $LONGDATETIME$\n" | /usr/bin/mail -s "** $NOTIFICATIONTYPE$ Host Alert: $HOSTNAME$ is $HOSTSTATE$ **" $CONTACTEMAIL$ } define command{ command_name notify-service-by-email command_line /usr/bin/printf "%b" "***** Nagios *****\n\nNotification Type: $NOTIFICATIONTYPE$\n\nService: $SERVICEDESC$\nHost: $HOSTALIAS$\nAddress: $HOSTADDRESS$\nState: $SERVICESTATE$\n\nDate/Time: $LONGDATETIME$\n\nAdditional Info:\n\n$SERVICEOUTPUT$\n" | /usr/bin/mail -s "** $NOTIFICATIONTYPE$ Service Alert: $HOSTALIAS$/$SERVICEDESC$ is $SERVICESTATE$ **" $CONTACTEMAIL$ } define command{ command_name check-host-alive command_line $USER1$/check_ping -H $HOSTADDRESS$ -w 3000.0,80% -c 5000.0,100% -p 5 } define command{ command_name check_local_disk command_line $USER1$/check_disk -w $ARG1$ -c $ARG2$ -p $ARG3$ } define command{ command_name check_local_load command_line $USER1$/check_load -w $ARG1$ -c $ARG2$ } define command{ command_name check_local_procs command_line $USER1$/check_procs -w $ARG1$ -c $ARG2$ -s $ARG3$ } define command{ command_name check_local_users command_line $USER1$/check_users -w $ARG1$ -c $ARG2$ } define command{ command_name check_local_swap command_line $USER1$/check_swap -w $ARG1$ -c $ARG2$ } define command{ command_name check_local_mrtgtraf command_line $USER1$/check_mrtgtraf -F $ARG1$ -a $ARG2$ -w $ARG3$ -c $ARG4$ -e $ARG5$ } define command{ command_name check_ftp command_line $USER1$/check_ftp -H $HOSTADDRESS$ $ARG1$ } define command{ command_name check_hpjd command_line $USER1$/check_hpjd -H $HOSTADDRESS$ $ARG1$ } define command{ command_name check_snmp command_line $USER1$/check_snmp -H $HOSTADDRESS$ $ARG1$ } define command{ command_name check_http command_line $USER1$/check_http -I $HOSTADDRESS$ $ARG1$ } define command{ command_name check_ssh command_line $USER1$/check_ssh $ARG1$ $HOSTADDRESS$ } define command{ command_name check_dhcp command_line $USER1$/check_dhcp $ARG1$ } define command{ command_name check_ping command_line $USER1$/check_ping -H $HOSTADDRESS$ -w $ARG1$ -c $ARG2$ -p 5 } define command{ command_name check_pop command_line $USER1$/check_pop -H $HOSTADDRESS$ $ARG1$ } define command{ command_name check_imap command_line $USER1$/check_imap -H $HOSTADDRESS$ $ARG1$ } define command{ command_name check_smtp command_line $USER1$/check_smtp -H $HOSTADDRESS$ $ARG1$ } define command{ command_name check_tcp command_line $USER1$/check_tcp -H $HOSTADDRESS$ -p $ARG1$ $ARG2$ } define command{ command_name check_udp command_line $USER1$/check_udp -H $HOSTADDRESS$ -p $ARG1$ $ARG2$ } define command{ command_name check_nt command_line $USER1$/check_nt -H $HOSTADDRESS$ -p 12489 -v $ARG1$ $ARG2$ } define command{ command_name process-host-perfdata command_line /usr/bin/printf "%b" "$LASTHOSTCHECK$\t$HOSTNAME$\t$HOSTSTATE$\t$HOSTATTEMPT$\t$HOSTSTATETYPE$\t$HOSTEXECUTIONTIME$\t$HOSTOUTPUT$\t$HOSTPERFDATA$\n" >> /var/log/nagios/host-perfdata.out } define command{ command_name process-service-perfdata command_line /usr/bin/printf "%b" "$LASTSERVICECHECK$\t$HOSTNAME$\t$SERVICEDESC$\t$SERVICESTATE$\t$SERVICEATTEMPT$\t$SERVICESTATETYPE$\t$SERVICEEXECUTIONTIME$\t$SERVICELATENCY$\t$SERVICEOUTPUT$\t$SERVICEPERFDATA$\n" >> /var/log/nagios/service-perfdata.out }
The cgi.cfg configuration file has several different functions with the Nagios system. Among the most important is authentication, allowing Nagios and its data to be restricted to appropriate people. Another one is authorization -- areas were the particular usee can do the damage ;-)
main_config_file=/etc/nagios/nagios.cfg physical_html_path=/usr/share/nagios/html url_html_path=/nagios show_context_help=0 use_pending_states=1
use_authentication=1 use_ssl_authentication=0 authorized_for_system_information=nagiosadmin authorized_for_configuration_information=nagiosadmin authorized_for_system_commands=nagiosadmin authorized_for_all_services=nagiosadmin authorized_for_all_hosts=nagiosadmin authorized_for_all_service_commands=nagiosadmin authorized_for_all_host_commands=nagiosadmin
default_statuswrl_layout=4 ping_syntax=/bin/ping -n -U -c 5 $HOSTADDRESS$ refresh_rate=90 result_limit=100 escape_html_tags=1 action_url_target=_blank notes_url_target=_blank lock_author_names=1 navbar_search_for_addresses=1 navbar_search_for_aliases=1
The final configuration file we will consider is the resource.cfg file. It is used to define so called site-specific
macros, strangely named $USER1$
through $USER32$
:
# $USER1$ = path to plugins directory
$USER1$=/usr/lib64/nagios/plugins
...
# Store a username and password (hidden)
$USER3$=admin
$USER4$=somepassword
The first macro defines the path to the Nagios plug-ins directory; this usage is assumed by the supplied sample configuration files.
The other two macros are used in this case to store a username and password. These items can be used in command definitions for added security. The resource.cfg file itself can be protected against all non-root access without compromising the ability of CGI programs to run successfully.
19 February 2009
When using m4 to configure Nagios, great advantages can be realized. One of the easiest places to gain an advantage by using m4 is when defining a new host.
Typically, a new host not only has a host definition but a number of fairly standardized services - such as ping, FTP, telnet, SSH, and so forth. Thus, when defining a new host configuration, you not only have to add a new host, but all of the relevant services as well - and may also include host extra info and service extra info also.
#---------------------------------------- # HOST: marco #---------------------------------------- define host{ use hpux-host ; Name of host template host_name marco address 192.168.4.1 } define hostextinfo{ host_name marco action_url http://marco-mp/ } define service{ use passive-service ; Name of servi host_name marco service_description System Load servicegroups Load } define service{ use hpux-service ; Name of service host_name marco service_description PING check_command check_ping!100.0,20%!500.0,60% } define service{ use hpux-service ; Name of service host_name marco service_description TELNET servicegroups TELNET check_command check_telnet } define serviceextinfo{ host_name marco service_description TELNET action_url telnet://marco } define service{ use hpux-service ; Name of service host_name marco service_description FTP servicegroups FTP check_command check_ftp } define service{ use hpux-service ; Name of service host_name marco service_description NTP servicegroups NTP check_command check_ntp } define service{ use hpux-service ; Name of service host_name marco service_description SSH servicegroups SSH check_command check_ssh }Compare that output from the m4 code that generated it:
DEFHPUX(`macro',`192.168.4.1')Another benefit is that if DEFHPUX is coded correctly (with each service independent - such as an m4 macro DOSSH for SSH) - then a single change to the m4 file, propogated to the Nagios config file, can alter a service for every HP-UX host (in this example).
Here is a possible definition of DEFHPUX:
define(`DEFHPUX',` #---------------------------------------- # HOST: $1 #---------------------------------------- define host{ use hpux-host ; Name of host template host_name $1 address $2 } define hostextinfo{ host_name $1 action_url http://$1-mp/ }' DOLOAD(`$1') DOPING(`$1') DOTELNET(`$1') DOFTP(`$1') DONTP(`$1') DOSSH(`$1')There is a lot more that m4 can do; this is just the tip of the iceberg.
Powered by ScribeFire.
Since Nagios configuration is somewhat involved, the package provides a command that can be used to verify it prior to reloading it as wrong configuration with break nagios. Here is an example of its use:
nagios -v /etc/nagios/nagios.cfg
This will check the Nagios configuration, which uses /etc/nagios/nagios.cfg as its main configuration file.
Once you've verified your configuration files and fixed any errors you can go ahead and (re)start Nagios Core. using the command
/etc/rc.d/init.d/nagios reload
You can also do it from Web interface
System monitoring tool Nagios offers a powerful mechanism for receiving events and commands from external applications. External commands are usually sent from event handlers or from the Nagios Web interface. You will find external commands most useful when writing event handlers for your system, or when writing an external application that interacts with Nagios.
The external commands pipe is a pipe file created on a filesystem that Nagios uses to receive incoming messages. The communication does not use any authentication or authorization -- the only requirement is to have write access to the pipe file, rw/nagios.cmd, which is located in the directory passed as the localstatedir option during compilation.
An external command file is usually writable by the owner and the group; the usual group used is nagioscmd. If you want a user to be able to send commands to the Nagios daemon, simply add that user to this group.
A small limitation of the command pipe is that there is no way to get any results back, so it is not possible to send any query commands to Nagios. Therefore, by just using the command pipe, you have no verification that the command you have passed to Nagios has been processed, or will be processed soon. It is, however, possible to read the Nagios log file and check whether it indicates that the command has been parsed correctly.
The Nagios Web interface uses an external command pipe to control how Nagios works. The Web interface does not use any other means to send commands or apply changes to Nagios.
From the Nagios daemon perspective, there is no clear distinction as to who can perform what operations. Therefore, if you plan to use the external command pipe to allow users to submit commands remotely, you need to make sure that authorization is in place so that unauthorized users cannot send potentially dangerous commands to Nagios.
The syntax for formatting commands is easy. Each command must be placed on a single line and end with a newline character. The syntax is as follows:
[TIMESTAMP] COMMAND_NAME;argument1;argument2;...;argumentNTIMESTAMP is written as Unix time -- that is, the number of seconds since 1970-01-01 00:00:00. You can create this by using the date command. Most programming languages also offer the means to get the current Unix time.
Commands are written in upper case. The arguments depend on the actual command. For example, to add a comment to a host stating that it has passed a security audit, you can use the following shell command
echo "['date +%s'] ADD_HOST_COMMENT;somehost;1;Security Audit; This host has passed security audit on 'date +%Y-%m-%d'" >/var/nagios/rw/nagios.cmdThis will send an ADD_HOST_COMMENT command to Nagios over the external command pipe. Nagios will then add a comment to the host, somehost, stating that the comment originated from Security Audit.
- The first argument specifies the host name to add the comment to;
- the second tells Nagios if this comment should be persistent.
- The next argument describes the author of the comment, and the last argument specifies the actual comment text.
Similarly, adding a comment to a service requires the use of the ADD_SVC_COMMENT command. The command's syntax is similar to that of the ADD_HOST_COMMENT command except that the command requires the specification of the host name and service name.
You can also delete a single comment or all comments using the DEL_HOST_ COMMENT, DEL_ALL_HOST_COMMENTS, and DEL_SVC_COMMENT or DEL_ALL_SVC_COMMENTS commands.
Other commands worth mentioning are related to scheduling checks on demand. Often, it is necessary to request that a check be carried out as soon as possible; for example, when testing a solution.
You can create a script that schedules a check of a host, all services on that host, and a service on a different host, as follows:
#!/bin/sh NOW='date +%s' echo "[$NOW] SCHEDULE_HOST_CHECK;somehost;$NOW" \ >/var/nagios/rw/nagios.cmd echo "[$NOW] SCHEDULE_HOST_SVC_CHECKS;somehost;$NOW" \ >/var/nagios/rw/nagios.cmd echo "[$NOW] SCHEDULE_SVC_CHECK;otherhost;Service Name;$NOW" \ >/var/nagios/rw/nagios.cmd exit 0The commands SCHEDULE_HOST_CHECK and SCHEDULE_HOST_SVC_CHECKS accept a host name and the time at which the check should be scheduled. The SCHEDULE_SVC_CHECK command requires the specification of a service description as well as the name of the host to schedule the check on.
Normal scheduled checks, such as the ones scheduled above, might not actually take place at the time that you scheduled them. Nagios also needs to take allowed time periods into account as well as checking whether checks were disabled for a particular object or globally for the entire Nagios.
There are cases when you'll need to force Nagios to do a check -- in such cases, you should use SCHEDULE_FORCED_HOST_CHECK, SCHEDULE_FORCED_HOST_SVC_CHECKS, and SCHEDULE_FORCED_SVC_CHECK commands. They work in exactly the same way as described above, but make Nagios skip the checking of time periods, and ensure that the checks are disabled for this particular object. This way, a check will always be performed, regardless of other Nagios parameters.
Other commands worth using are related to custom variables, introduced in Nagios 3. When you define a custom variable for a host, service, or contact, you can change its value on the file with the external command pipe.
As these variables can then be directly used by check or notification commands and event handlers, it is possible to make other applications or event handlers change these attributes directly without modifications to the configuration files.
How might this work? Suppose that the IT staff registers its presence via an application without any GUI. This application periodically sends information about the latest known IP address, and that information is then passed to Nagios assuming that the person is in the office. This would later be sent to a notification command to use that specific IP address while sending a message to the user.
Assuming that the user name is jdoe and the custom variable name is DESKTOPIP, the message that would be sent to the Nagios external command pipe would be as follows:
[1206096000] CHANGE_CUSTOM_CONTACT_VAR;jdoe;DESKTOPIP;12.34.56.78This would cause a subsequent use of $_CONTACTDESKTOPIP$ to return a value of 12.34.56.78.
Nagios offers the CHANGE_CUSTOM_CONTACT_VAR, CHANGE_CUSTOM_HOST_VAR, and CHANGE_CUSTOM_ SVC_VAR commands for modifying custom variables in contacts, hosts, and services.
The commands explained above are just a small subset of the full capabilities of the Nagios external command pipe. For a complete list of commands, visit the External Command List.
Adapted from Remotely monitor servers with the Nagios check_by_ssh plugin by Vincent Danen TechRepublic.com, January 20th, 2009
Nagios is a monitoring system that can be used to monitor a wide variety of services and criteria. Remotely, it can monitor anything that can be accessed remotely: Web sites, SMTP servers, FTP servers, and so forth. Locally, it can monitor even more: load average, swap and memory usage, disk space usage, hard drive temperatures, and the like. In fact, Nagios’ extensible nature makes writing plugins a breeze, so it is possible to monitor anything for which you are able to get representable data.
Unfortunately, if you wish to monitor local resource usage on a remote site it can be a little trickier. There are a number of ways this can be done like using NRPE (Nagios Remote Plugin Executor). These solutions may be best if you are able to compile and install software on the other machine, but if that is not a possibility, there are other solutions.
One such solution is to execute checks via SSH. If you are able to access the remote machine via SSH and have the ability to run programs out of a home directory, and the ability to set an SSH public key, then the check_by_ssh plugin is perhaps your best bet.
The first step is to ensure that the central Nagios server is able to connect to the remote host via SSH in a manner that does not require a password. This would require creating a password-less public/private keypair as the user running the Nagios service (typically “nagios”), sending the public key to the remote server, and then (as user “nagios”) logging into the remote system. For example:
nagios@nagiosserver:~/ > $ ssh-keygen -t dsaGenerating public/private dsa key pair.Enter file in which to save the key (/home/nagios/.ssh/id_dsa):Enter passphrase (empty for no passphrase):Enter same passphrase again:Your identification has been saved in /home/nagios/. sYour public key has been saved in /home/nagios/.ssh/id_dsa.pub.The key fingerprint is:6a:b4:cb:f1:7d:7b:7c:1b:c4:79:2a:5d:5a:16:da:b8 [email protected]nagios@nagiosserver:~/ > $ scp .ssh/id_dsa.pub [email protected]:~/.ssh/authorized_keysnagios@nagiosserver:~/ > $ ssh [email protected]user@remotehost:~/ > $This creates the key without a passphrase and then copies the newly-created id_dsa.pub public key file to the remote host. Make sure that the ~user/.ssh directory already exists on the remote host and ensure that it is mode 0700 to protect it. If that is all correct, then using ssh to connect to the remote site as the specified user should yield a shell prompt. If so, then we can configure Nagios to use check_by_ssh.
One quick note: if you are able to create a dedicated account on the remote system for this, it would be best to do so. If, on the other hand, you are unable to, be sure to adequately protect your central Nagios server, because if anyone can obtain privileges as “nagios” on the central server, they will have an easy ticket to your user account on the remote server.
As well, copying whichever plugins you wish to execute on the remote machine into a ~/bin or ~/plugins directory would be the next step. To step up security, you can write a wrapper script to execute those specific commands and modify ~/.ssh/authorized_keys on the remote server to only execute the wrapper script, which would prevent that key from being used for anything other than executing Nagios checks.
On the central Nagios server, in the commands.cfg configuration file, define the new checks. The example below defines a new check_ssh_load command:
# 'check_ssh_load' command definitiondefine command {command_name check_ssh_loadcommand_line $USER1$/check_by_ssh -H $HOSTADDRESS$ -C "/home/user/bin/check_load -w $ARG1$ -c $ARG2$"}This command will call the check_by_ssh plugin to connect to the specified host (via the $HOSTADDRESS$ macro) and execute the command /home/user/bin/check_load, which is the check_load plugin, on the remote machine; you will need to adjust the path to match the location of that plugin on the remote server. As well, if paths and/or usernames differ on remote servers and you plan to monitor more than one, you may need to define multiple commands, one for each server (or use macros).
Next, edit services.cfg and add the following:
define service {use local-service ; check current load on machinehostgroup_name ssh-nagios-servicesservice_description Current Loadcheck_command check_ssh_load!5.0,4.0,3.0!10.0,6.0,4.0}This defines a new service to execute for hosts in the ssh-nagios-services hostgroup. It calls the defined check_ssh_load command and will put the service in a warn state if the load average hits 5, and a critical state if it hits 10 (adjust to suit, of course).
Finally, edit hostgroups.cfg to create the ssh-nagios-services hostgroup. Systems added to this hostgroup will automatically begin to use the defined service.
define hostgroup {hostgroup_name ssh-nagios-servicesalias Nagios over SSHmembers remote1,remote2}Here we define that remote1 and remote2 both belong to this hostgroup. As a result, both will start using the check_ssh_load command.
Using check_by_ssh is a convenient and secure way to execute Nagios plugins on remote servers. When all you can see of the status of a remote server is HTTP or SMTP availability, your view of the server is quite restricted. Being able to see local resource usage can allow you to spot problems, and correct them, before they are visible to users.
Get the PDF version of this tip here.
reDelivered each Tuesday, TechRepublic’s free Linux and Open Source newsletter provides tips, articles, and other resources to help you hone your Linux skills. Automatically sign up today!
Expect to the rescue! Expect is a nifty program that automates tasks based on input and output. You should have installed Expect on your computer by now.Using heavily Googled code, I concocted my own version of the
sshexpect
script:#!/usr/bin/expect -f #Expect script to supply root/admin password for remote ssh server #and execute command. #This script needs three argument to(s) connect to remote server: #password = Password of remote UNIX server, for root user. #ipaddr = IP Addreess of remote UNIX server, no hostname #scriptname = Path to remote script which will execute on remote server #For example: #./sshlogin.exp password 192.168.1.11 who #------------------------------------------------------------------------ #Copyright (c) 2004 nixCraft project <http://cyberciti.biz/fb/> #This script is licensed under GNU GPL version 2.0 or above #------------------------------------------------------------------------- #This script is part of nixCraft shell script collection (NSSC) #Visit http://bash.cyberciti.biz/ for more information. #---------------------------------------------------------------------- #set Variables set ipaddr [lrange $argv 0 0] set password [lrange $argv 1 1] set scriptname [lrange $argv 2 2] set arg1 [lrange $argv 3 3] set arg2 [lrange $argv 4 4] set arg3 [lrange $argv 5 5] set arg4 [lrange $argv 6 6] set arg5 [lrange $argv 7 7] set arg6 [lrange $argv 8 8] set arg7 [lrange $argv 9 9] #setting a timeout for the password prompt 5 seconds larger than the SSH ConnectionTimeout parameter set timeout 35 #now connect to remote UNIX box (ipaddr) with given script to execute set pid [spawn -noecho ssh -o "ConnectTimeout 30" -o "CheckHostIP no" -o "StrictHostKeyChecking no" $ipaddr $scriptname $arg1 $arg2 $arg3 $arg4 $arg5 $arg6 $arg7] match_max 5000 #look for password prompt log_user 0 expect { "denied" {puts "CRITICAL: wrong SSH password" ; exit 2} "Name or service not known" {puts "CRITICAL: cannot resolve SSH server name $ipaddr" ; exit 2} "Connection refused" {puts "CRITICAL: SSH connection to $ipaddr refused" ; exit 2} "Connection timed out" {puts "CRITICAL: SSH connection to $ipaddr timed out" ; exit 2} timeout {puts "CRITICAL: SSH server timed out while prompting for password" ; exit 2} "?assword:" } # send password send -- "$password" # send blank line to make sure we get back to gui send -- "r" expect "r" log_user 1 # now we wait up to 30 seconds set timeout 30 expect { timeout {puts "CRITICAL: execution of $scriptname timed out after 30 seconds" ; exit 2} eof } set waitret [wait] catch {close} set state [lindex $waitret 2] exit [lindex $waitret 3What this script does is fairly easy to understand (once it's been explained to you!). It starts
ssh
with the passed arguments (a maximum of 8), against the server you specify and with a password you specify as well. It returns the status value of the remoted (remotely invoked) command.The script suppresses any SSH output not related to the command, so beware: if the password is wrong, you will not be told. The script also make SSH not prompt for host authentication, so if you're finicky about security, perhaps this is the wrong approach for you. But it works for me, so let's go on. Again, keep reading.
The Nagios Remote Plugin Executor (or in short, NRPE) is a hack that provide NAGIOS with an agent capable to executes programs on a remote host at requests from the central host. These are usually plugins that test the corresponding computer locally and therefore must be installed on this remote server. But the use of NRPE is not restricted to local plugins; any plugins at all can be executed, including those intended to test network services-for example, to indirectly test computers that are not reachable from the Nagios server or to distribute load from Nagios serve is number of monitored nodes is high.
This is an alternative method to running plugins via SSH and it saves some connection time between server and the remote machine, which is inherent in ssh based invocation. SSH method requires an account with a local shell, however, thus enabling any command to be run on the target host from this account which creates opportunity for abuse. The Remote Plugin Executor, on the other hand, is restricted to the commands configured.
The NRPE is installed on the target host and run as a service via the inet daemon. When NRPE receives a query from the Nagios server via the (selectable) TCP port 5666, it will run the matching query for this. As with the method using the Secure Shell, the plugin that is to perform the test must be installed (in most case that mean simply copied to appropriate directory and configured in NRPE configuration file) on the target host or be available from NFS or other file sharing system.
NRPE, however requires more configuration work than the secure shell. In addition to the Nagios configuration and the installation
of the check_nrpe
plugin on the Nagios server (which sends requires to execute plugin remotely to all
selected clients), the following changes are required on each client to run NRPE plug-in
nrpe
must be installed.inetd
or xinetd
) must be configured to run
nrpe as service
with administrator privileges.Those days tasks (1) and (2) are accomplished by installing RPM nagios-plugins-nrpe. In this case config file will be at /etc/nagios/nrpe.cfg and plugins at /usr/lib64/nagios/plugins
Instead of installing selected plugins you can install them all on the client and later synchronize via rsync with NAGIOS server. So the main manual task is configuration of plugins in NRPE configuration file.
Extracted from: Nagios Howto: Using NRPE To Monitor Remote Services. This whitepaper is a continuation to the previously article, Nagios Howto: Notification Escalations, EventHandlers & Remote Service Monitoring With NRPE.
As previously mentioned, our focus assumes the use of Linux and a working Nagios installation. I highly suggest you go back to read the previous Nagios howto as it contains important information that we will building upon as we move into the second part of this whitepaper.
Thank you for rejoining if you have already read the first Crucial Nagios whitepaper.
As you have likely seen, the Nagios docs leave a bit to be desired when it comes to information on the NRPE plugin. In its simplest form, the NRPE plugin allows you to monitor any number of remote network devices and services using a single Nagios installation. However, when we combine EventHandlers with NRPE we then have the ability to repair our remote servers-self-healing servers. For now, we will focus our attention to NRPE and walk through the steps to properly configure your NRPE daemon.
The NRPE source code and default plugin is available from the Nagios website. You will need to download the NRPE plugin and any other plugins to the remote machine that you intend to monitor:
cd /usr/src
wget http://umn.dl.sourceforge.net/sourceforge/nagiosplug/nagios-plugins-1.4.6.tar.gz
tar zxvf nagios-plugins-1.4.6.tar.gz
cd nagios-plugins-1.4.6The instructions above will download and extract the the Nagios plugins, as well as change into that directory.
Build The Source Code
We now need to build the source code. This step needs to be done on each remote system that you plan to monitor. Follow the steps below to build the default plugin set:
./configure –prefix=/usr/local/nagios
make
make installWe now have /usr/local/nagios/libexec/ which contains the default plugin set.
At this time we need to download and install the NRPE daemon and plugin. The steps below detail the commands needed for execution:
cd /usr/src/
wget http://internap.dl.sourceforge.net/sourceforge/nagios/nrpe-2.7.tar.gz
tar zxvf nrpe-2.7.tar.gz
cd nrpe-2.7
./configure
make allMove Things Around
Now we need to manually move the files into place:
cp src/nrpe /usr/local/nagios/libexec/
cp src/check_nrpe /usr/local/nagios/libexec/
cp sample-config/nrpe.cfg /usr/local/nagios/libexec/We now have our executables in place and are ready to begin configuring the NRPE daemon on the remote system.
Configuration
The sample configuration file we copied above is a well documented file. You should take the time to read this file and familiarize yourself with the configuration options that we will be setting below. Open the nrpe.cfg file in the Nagios libexec directory in your favorite editor.
We are going to leave some default settings and change a few settings for our needs. Set the following configuration options as follows:
pid_file=/var/run/nrpe.pid
server_port=5666
# Set this if you want to nail NRPE to specific IP address
# server_address=192.168.1.1
nrpe_user=nagios
nrpe_group=nagios
# Set this to the remove Nagios installation IP address
allowed_hosts=127.0.0.1
dont_blame_nrpe=0
# command_prefix=/usr/bin/sudo
# Set this to 1 for logging in syslog
debug=0
command_timeout=60
connection_timeout=300
# allow_weak_random_seed=1Thats it for the configuration of NRPE.
Commands
We now need to look at the available commands to NRPE. If you scroll to the bottom of the nrpe.cfg file you will see the default commands***. The commands are structured like so:
command[check_disk1]=/usr/local/nagios/libexec/check_disk -w 20 -c 10 -p /dev/hda1
Command names are completely arbitrary and can be created on the fly, e.g.:
command[check_disk2]=/usr/local/nagios/libexec/check_disk -w 20 -c 10 -p /dev/hdb1
Very simple format, check_disk1 is the command name located at /usr/local/nagios/libexec/check_disk with the arugments -w 20 -c 10 -p /dev/hda1. I used this particular command because it contains the disk check-this is the one command that you may possibly need to alter immediately for effective use. At the end of the command we see the path of the disk device to check on, /dev/hda1. You may not have this drive configuration so you will need to replace that with the path to your local disk setup. An easy way to figure this out is to issue the command df -h and use the returned entry for home as this is the primary usage space for most.
System Setup
At this point, we have completed configuring NRPE and we need to setup the system to accommodate Nagios.
First we need to setup permissions for the Nagios user.
adduser nagios
chown -R nagios.nagios /usr/local/nagios/We've setup our Nagios user and changed the ownership of all the files under the nagios/ dir.
Now we need to edit the file /etc/services and add the following line:
nrpe 5666/tcp # NRPE
Now, we need to tell our inetd or xinetd about NRPE. Create a file in /etc/xinetd.d/ called nrpe, and add the following to that file:
# default: on
# description: NRPE
service nrpe {
flags = REUSE
socket_type = stream
wait = no
user = nagios
server = /usr/local/nagios/libexec/nrpe
server_args = -c /usr/local/nagios/libexec/nrpe.cfg –inetd
log_on_failure += USERID
disable = no
# Change this to your primary Nagios server
only_from = 127.0.0.1
}This describes to the "super server" the various options necessary to launch the NRPE daemon when our remote Nagios monitoring system connects.
Now, open the /etc/hosts.allow file and add an entry for the IP address of your remote monitoring server. If you have a firewall, you will also want to configure it so that you allow remote connections from the IP address of your remote monitoring system to port 5666.
Restart your xinetd daemon to reload the configuration changes:
/etc/init.d/xinetd reload
Let's test it out real quick to make sure nothing has gone wrong so far. From your remote monitoring server issue the following command:
telnet ip.address.of.remote.nrpe 5666
If the connection immediately closes you've got a problem and something isn't right. If the socket opens and you are met with the following:
Escape character is '^]'.
Then y ou're ready to move on. If you've got problems at this point, go back through each of the steps above and check for any errors in configuration. Since we've enabled DEBUG in our nrpe.cfg you can also view your syslog file for failure information.
Add New Host
We are now ready to add our new host to our primary Nagios installation. This is very straight forward and should only take a moment.
Back on the primary Nagios installation server we need to edit our hosts.cfg configuration file. The file is located in /usr/local/nagios/etc/hosts.cfg. This may change depending on your installation and organization of configuration files. Read the first part of this whitepaper for organization advise.
In the hosts.cfg file, add your new host object:
define host{
use generic-host
# Hostname of remote system
host_name host.domain.com
# A friendly name for this server
alias Friendly name
# Remote host IP address
address 127.0.0.1
check_command check-host-alive
max_check_attempts 10
notification_interval 30
notification_period 24×7
notification_options d,r
# Your defined contact group name
contact_groups admins
}At this time our hosts.cfg file contains two hosts objects, the localhost which is running the Nagios application and our remote host which we will be monitoring.
We now want to add the service objects to our services.cfg file located in the same directory. Add the following single service to your services.cfg file:
define service{
use generic-service
# Hostname of remote system
host_name host.domain.com
service_description Primary Disk Usage
is_volatile 0
check_period 24×7
max_check_attempts 3
normal_check_interval 5
retry_check_interval 1
# Change to your contact group
contact_groups admins
notification_options w,u,c,r
notification_interval 10
notification_period 24×7
check_command check_nrpe!check_disk1
}You can view the Nagios documentation for the full details on each of these object configuration options. You will likely want to alter from the values shown above to your monitoring environment. However, we will take a look at that last line, the check_nrpe option.
check_nrpe
When monitoring remote services, we first issue a check_nrpe command followed by a ! and the command on the remote machine to run. This means that we are going to need an instance of check_nrpe on our Nagios Server. Simply follow the directions above to download, build, and install the NRPE check_nrpe script and the nrpe daemon. Once you have installed these on the Nagios primary server, then we can proceed.
Now that nrpe is installed on the primary Nagios server, and our new host and host service is configured, we can reload nagios service:
/etc/init.d/nagios reload
Web Interface
With the configuration read, you should now be able to access the web interface of Nagios. Under the Service Detail link you should see both the new remote host and the server/service we have setup to monitor. It is likely in an Unknown State at this time as the service has not been checked yet.
According to our service definition above, this service will be checked once every five minutes. If all has gone well, we should see the green in less than 5 minutes, which confirms proper installation and configuration of NRPE. In failure the service will go into a Soft State for two additional minutes. Once a Hard Failure state is achieved, you will see red and you should be able to check your Nagios log file in nagios/var/nagios.log for further information.
There are a lot of moving parts with this project so it is best to focus on a single server and a single service. Once you have a service properly configured it is a short step to configure the next service. Simply copy the service object created above and change the nrpe_check!'command_issued'.
What You Can Do
Many things can be monitored with NRPE that can not be monitored remotely by Nagios without NRPE. These include:
- Disk space
- Zombie processes
- Number of shell users
- Total processes
- Load average
And any thing else that doesn't run as a public service on the server.
Obviously, the advantage to remotely monitoring these server objects in a central location is that a problem may be much more quickly identified. This combined with the previous whitepaper's escalation procedures provide an effective response tool for reactively monitoring remote servers.
Remember, any of the commands that you have in the nagios/libexec folder are available to NRPE. To run these commands on the remote server, you simply need to setup the command in the nrpe.cfg file on the remote server. Here is an example using check_load:
command[check_load]=/usr/local/nagios/libexec/check_load -w 15,10,5 -c 30,25,20
The -w and the -c are the Warn and Critical thresholds
Follow the steps above to add the new service,check_load, to your services.cfg file. Reload Nagios and that's that.
EventHandler
In the next whitepaper, we will change our focus to the Nagios EventHandler. I will demonstrate to you how to repair problems that Nagios encounters before even contacting a single human. At Crucial Web Hosting, we make extensive use of the EventHandler object in Nagios and we credit it for a very happy support team. Using the EventHandler objects we can diagnose and repair common problems that occur on local and remote servers in a matter of minutes and seconds as opposed to hours and days.
We will be performing root tasks using the sudo method and we will create a simple custom EventHandler on a remote server thus demonstrating how you can roll your own Nagios plugins.
In the next whitepaper you will learn how to make your servers heal themselves with no human interaction!
If you missed the first whitepaper in the series I am writing, you can access it here.
|
Switchboard | ||||
Latest | |||||
Past week | |||||
Past month |
Mar 23, 2020 | linuxconfig.org
In this tutorial you will learn:
- How to install NRPE on Debian/Red Hat based distributions
- How to configure NRPE to accept commands from the server
- How to configure a custom check on the server and client side
Sep 13, 2019 | linuxconfig.org
... ... ...
We can also include our own custom configuration file(s) in our custom packages, thus allowing updating client monitoring configuration in a centralized and automated way. Keeping that in mind, we'll configure the client in/etc/nrpe.d/custom.cfg
on all distributions in the following examples.NRPE does not accept any commands other then
localhost
by default. This is for security reasons. To allow command execution from a server, we need to set the server's IP address as an allowed address. In our case the server is a Nagios server, with IP address10.101.20.34
. We add the following to our client configuration:allowed_hosts=10.101.20.34
Multiple addresses or hostnames can be added, separated by commas. Note that the above logic requires static address for the monitoring server. Using
Configuring a custom check on the server and client sidedhcp
on the monitoring server will surely break your configuration, if you use IP address here. The same applies to the scenario where you use hostnames, and the client can't resolve the server's hostname.To demonstrate our monitoring setup's capabilites, let's say we would like to know if the local postfix system delivers a mail on a client for user
root
. The mail could contain acronjob
output, some report, or something that is written to theSTDERR
and is delivered as a mail by default. For instance,abrt
sends a crash report toroot
by default on a process crash. We did not setup a mail relay, but we still would like to know if a mail arrives. Let's write a custom check to monitor that.
- Our first piece of the puzzle is the check itself. Consider the following simple bash script called
check_unread_mail
:#!/bin/bash USER=root if [ "$(command -v finger >> /dev/null; echo $?)" -gt 0 ]; then echo "UNKNOWN: utility finger not found" exit 3 fi if [ "$(id "$USER" >> /dev/null ; echo $?)" -gt 0 ]; then echo "UNKNOWN: user $USER does not exist" exit 3 fi ## check for mail if [ "$(finger -pm "$USER" | tail -n 1 | grep -ic "No mail.")" -gt 0 ]; then echo "OK: no unread mail for user $USER" exit 0 else echo "WARNING: unread mail for user $USER" exit 1 fiThis simple check uses the
finger
utility to check for unread mail for userroot
. Output of thefinger -pm
may vary by version and thus distribution, so some adjustments may be needed.For example on Fedora 30, last line of the output of
finger -pm <username>
is "No mail.", but on openSUSE Leap 15.1 it would be "No Mail." (notice the upper case Mail). In this case thegrep -i
handles this difference, but it shows well that when working with different distributions and versions, some additional work may be needed.- We'll need
finger
to make this check work. The package's name is the same on all distributions, so we can install it withapt
,zypper
,dnf
oryum
.- We need to set the check executable:
# chmod +x check_unread_mail- We'll place the check into the
/usr/lib64/nagios/plugins
directory, the common place for nrpe checks. We'll reference it later.- We'll call our command
check_mail_root
. Let's place another line into our custom client configuration, where we tellnrpe
what commands we accept, and what need to be done when a given command arrives:command[check_mail_root]=/usr/lib64/nagios/plugins/check_unread_mail- With this our client configuration is complete. We can start the service on the client with
systemd
. The service name isnagios-nrpe-server
on Debian derivatives, and simplynrpe
on other distributions.# systemctl start nagios-nrpe-server # systemctl status nagios-nrpe-server ● nagios-nrpe-server.service - Nagios Remote Plugin Executor Loaded: loaded (/lib/systemd/system/nagios-nrpe-server.service; enabled; vendor preset: enabled) Active: active (running) since Tue 2019-09-10 13:03:10 CEST; 1min 51s ago Docs: http://www.nagios.org/documentation Main PID: 3782 (nrpe) Tasks: 1 (limit: 3549) CGroup: /system.slice/nagios-nrpe-server.service └─3782 /usr/sbin/nrpe -c /etc/nagios/nrpe.cfg -f szept 10 13:03:10 mail-test-client systemd[1]: Started Nagios Remote Plugin Executor. szept 10 13:03:10 mail-test-client nrpe[3782]: Starting up daemon szept 10 13:03:10 mail-test-client nrpe[3782]: Server listening on 0.0.0.0 port 5666. szept 10 13:03:10 mail-test-client nrpe[3782]: Server listening on :: port 5666. szept 10 13:03:10 mail-test-client nrpe[3782]: Listening for connections on port 5666
- Now we can configure the server side. If we don't have one already, we can define a command that calls a remote
nrpe
instance with a command as it's sole argument:# this command runs a program $ARG1$ with no arguments define command { command_name check_nrpe_1arg command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -t 60 -c $ARG1$ 2>/dev/null }- We also define the client as a host:
define host { use linux-server host_name mail-test-client alias mail-test-client address mail-test-client }The address can be an IP address or hostname. In the later case we need to ensure it can be resolved by the monitoring server.- We can define a service on the above host using the Nagios side command and the client side command:
define service { use generic-service host_name mail-test-client service_description OS:unread mail for root check_command check_nrpe_1arg!check_mail_root }These adjustments can be placed to any configuration file the Nagios server reads on startup, but it is a good practice to keep configuration files tidy.- We verify our new Nagios configuration:
# nagios -v /etc/nagios/nagios.cfgIf "Things look okay", we can apply the configuration with a server reload:
Feb 07, 2019 | lintut.com
Nagios is an opensource software used for network and infrastructure monitoring . Nagios will monitor servers, switches, applications and services . It alerts the System Administrator when something went wrong and also alerts back when the issues has been rectified.
View also: How to Enable EPEL Repository for RHEL/CentOS 6/5
View also: How to Enable EPEL Repository for RHEL/CentOS 6/5
yum install nagios nagios-devel nagios-plugins* gd gd-devel httpd php gcc glibc glibc-commonBydefualt on doing yum install nagios, in cgi.cfg file, authorized user name nagiosadmin is mentioned and for htpasswd file /etc/nagios/passwd file is used.So for easy steps I am using the same name.
# htpasswd -c /etc/nagios/passwd nagiosadminCheck the below given values in /etc/nagios/cgi.cfg
nano /etc/nagios/cgi.cfg
# AUTHENTICATION USAGE
use_authentication=1
# SYSTEM/PROCESS INFORMATION ACCESS
authorized_for_system_information=nagiosadmin
# CONFIGURATION INFORMATION ACCESS
authorized_for_configuration_information=nagiosadmin
# SYSTEM/PROCESS COMMAND ACCESS
authorized_for_system_commands=nagiosadmin
# GLOBAL HOST/SERVICE VIEW ACCESS
authorized_for_all_services=nagiosadmin
authorized_for_all_hosts=nagiosadmin
# GLOBAL HOST/SERVICE COMMAND ACCESS
authorized_for_all_service_commands=nagiosadmin
authorized_for_all_host_commands=nagiosadminFor provoding the access to nagiosadmin user in http, /etc/httpd/conf.d/nagios.conf file exist. Below is the nagios.conf configuration for nagios server.
cat /etc/http/conf.d/nagios.conf
# SAMPLE CONFIG SNIPPETS FOR APACHE WEB SERVER
# Last Modified: 11-26-2005
#
# This file contains examples of entries that need
# to be incorporated into your Apache web server
# configuration file. Customize the paths, etc. as
# needed to fit your system.ScriptAlias /nagios/cgi-bin/ "/usr/lib/nagios/cgi-bin/"
# SSLRequireSSL
Options ExecCGI
AllowOverride None
Order allow,deny
Allow from all
# Order deny,allow
# Deny from all
# Allow from 127.0.0.1
AuthName "Nagios Access"
AuthType Basic
AuthUserFile /etc/nagios/passwd
Require valid-userAlias /nagios "/usr/share/nagios/html"
# SSLRequireSSL
Options None
AllowOverride None
Order allow,deny
Allow from all
# Order deny,allow
# Deny from all
Allow from 127.0.0.1
AuthName "Nagios Access"
AuthType Basic
AuthUserFile /etc/nagios/passwd
Require valid-userStart the httpd and nagios /etc/init.d/httpd start /etc/init.d/nagios start [warn]Note: SELINUX and IPTABLE are disabled.[/warn] Access the nagios server by http://nagios_server_ip-address/nagios Give the username = nagiosadmin and password which you have given to nagiosadmin user.
Nov 12, 2017 | www.howtoforge.com
Installing Nagios 3.4.4 On CentOS 6.3 Introduction
Nagios is a monitoring tool under GPL licence. This tool lets you monitor servers, network hardware (switches, routers, ...) and applications. A lot of plugins are available and its big community makes Nagios the biggest open source monitoring tool. This tutorial shows how to install Nagios 3.4.4 on CentOS 6.3.
PrerequisitesAfter installing your CentOS server, you have to disable selinux & install some packages to make nagios work.
To disable selinux, open the file: /etc/selinux/config
# vi /etc/selinux/config
# This file controls the state of SELinux on the system. # SELINUX= can take one of these three values: # enforcing - SELinux security policy is enforced. # permissive - SELinux prints warnings instead of enforcing. # disabled - No SELinux policy is loaded. SELINUX=permissive // change this value to disabled # SELINUXTYPE= can take one of these two values: # targeted - Targeted processes are protected, # mls - Multi Level Security protection. SELINUXTYPE=targetedNow, download all packages you need:
# yum install gd gd-devel httpd php gcc glibc glibc-common
Nagios InstallationCreate a directory:
# mkdir /root/nagios
Navigate to this directory:
# cd /root/nagios
Download nagios-core & plugin:
# wget http://prdownloads.sourceforge.net/sourceforge/nagios/nagios-3.4.4.tar.gz
# wget http://prdownloads.sourceforge.net/sourceforge/nagiosplug/nagios-plugins-1.4.16.tar.gzUntar nagios core:
# tar xvzf nagios-3.4.4.tar.gz
Go to the nagios dir:
# cd nagios
Configure before make:
# ./configure
Make all necessary files for Nagios:
# make all
Installation:
# make install
# make install-init
# make install-commandmode
# make install-config
# make install-webconf
Create a password to log into the web interface:
# htpasswd -c /usr/local/nagios/etc/htpasswd.users nagiosadmin
Start the service and start it on boot:
# chkconfig nagios on
# service nagios startNow, you have to install the plugins:
# cd ..
# tar xvzf nagios-plugins-1.4.15.tar.gz
# cd nagios-plugins-1.4.15
# ./configure
# make
# make installStart the apache service and enable it on boot:
# service httpd start
# chkconfig httpd onNow, connect to your nagios system:
http://Your-Nagios-IP/nagios and enter login : nagiosadmin & password you have chosen above.
And after the installation ?After the installation you have to configure all your host & services in nagios configuration files.This step is performed in command line and is complicated, so I recommand to install tool like Centreon, that is a beautiful front-end to add you host & services.
To go further, I recommend you to read my article on Nagios & Centreon monitoring .
Nov 12, 2017 | www.tecmint.com
Requirements
Step 1: Install Pre-requirements for Nagios1. Before installing Nagios Core from sources in Ubuntu or Debian , first install the following LAMP stack components in your system, without MySQL RDBMS database component, by issuing the below command.
# apt install apache2 libapache2-mod-php7.0 php7.02. On the next step, install the following system dependencies and utilities required to compile and install Nagios Core from sources, by issuing the follwoing command.
# apt install wget unzip zip autoconf gcc libc6 make apache2-utils libgd-devStep 2: Install Nagios 4 Core in Ubuntu and Debian3. On the first step, create nagios system user and group and add nagios account to the Apache www-data user, by issuing the below commands.
# useradd nagios # usermod -a -G nagios www-data4. After all dependencies, packages and system requirements for compiling Nagios from sources are present in your system, go to Nagios webpage and grab the latest version of Nagios Core stable source archive by issuing the following command.
# wget https://assets.nagios.com/downloads/nagioscore/releases/nagios-4.3.4.tar.gz5. Next, extract Nagios tarball and enter the extracted nagios directory, with the following commands. Issue ls command to list nagios directory content.
# tar xzf nagios-4.3.4.tar.gz # cd nagios-4.3.4/ # lsList Nagios Content
6. Now, start to compile Nagios from sources by issuing the below commands. Make sure you configure Nagios with Apache sites-enabled directory configuration by issuing the below command.
# ./configure --with-httpd-conf=/etc/apache2/sites-enabled7. In the next step, build Nagios files by issuing the following command.
# make all8. Now, install Nagios binary files, CGI scripts and HTML files by issuing the following command.
# make install9. Next, install Nagios daemon init and external command mode configuration files and make sure you enable nagios daemon system-wide by issuing the following commands.
# make install-init # make install-commandmode # systemctl enable nagios.service10. Next, run the following command in order to install some Nagios sample configuration files needed by Nagios to run properly by issuing the below command.
# make install-config11. Also, install Nagios configuration file for Apacahe web server, which can be fount in /etc/apacahe2/sites-enabled/ directory, by executing the below command.
# make install-webconf12. Next, create nagiosadmin account and a password for this account necessary by Apache server to log in to Nagios web panel by issuing the following command.
# htpasswd -c /usr/local/nagios/etc/htpasswd.users nagiosadmin13. To allow Apache HTTP server to execute Nagios cgi scripts and to access Nagios admin panel via HTTP, first enable cgi module in Apache and then restart Apache service and start and enable Nagios daemon system-wide by issuing the following commands.
# a2enmod cgi # systemctl restart apache2 # systemctl start nagios # systemctl enable nagios14. Finally, log in to Nagios Web Interface by pointing a browser to your server's IP address or domain name at the following URL address via HTTP protocol. Log in to Nagios with nagiosadmin user the password setup with htpasswd script.
http://IP-Address/nagios OR http://DOMAIN/nagios
Jan 26, 2012 | sanctum.geek.nz
Nagios is useful for monitoring pretty much any kind of network service, with a wide variety of community-made plugins to test pretty much anything you might need. However, its configuration and interface can be a little bit cryptic to initiates. Fortunately, Nagios is well-packaged in Debian and Ubuntu and provides a basic default configuration that is instructive to read and extend.
There's a reason that a lot of system administrators turn into monitoring fanatics when tools like Nagios are available. The rapid feedback of things going wrong and being fixed and the pleasant sea of green when all your services are up can get addictive for any halfway dedicated administrator.
In this article I'll walk you through installing a very simple monitoring setup on a Debian or Ubuntu server. We'll assume you have two computers in your home network, a workstation on
192.168.1.1
and a server on192.168.1.2
, and that you maintain a web service of some sort on a remote server, for which I'll usewww.example.com
.We'll install a Nagios instance on the server that monitors both local services and the remote webserver, and emails you if it detects any problems.
For those not running a Debian-based GNU/Linux distribution or perhaps BSD, much of the configuration here will still apply, but the initial setup will probably be peculiar to your ports or packaging system unless you're compiling from source.
Installing the packagesWe'll work on a freshly installed Debian Stable box as the server, which at the time of writing is version 6.0.3 "Squeeze". If you don't have it working already, you should start by installing Apache HTTPD:
# apt-get install apache2Visit the server on
http://192.168.1.1/
and check that you get the "It works!", and that should be all you need. Note that by default this installation of Apache is not terribly secure, so you shouldn't allow access to it from outside your private network until you've locked it down a bit, which is outside the scope of this article.Next we'll install the
nagios3
package, which will include a default set of useful plugins, and a simple configuration. The list of packages it needs to support these is quite long so you may need to install a lot of dependencies, whichapt-get
will manage for you.# apt-get install nagios3The installation procedure will include requesting a password for the administration area; provide it with a suitable one. You may also get prompted to configure a workgroup for the
samba-common
package; don't worry, you aren't installing asamba
service by doing this, it's just information for thesmbclient
program in case you want to monitor any SMB/CIFS services.That should provide you with a basic self-monitoring Nagios setup. Visit
http://192.168.1.1/nagios3/
in your browser to verify this; use the usernamenagiosadmin
and the password you gave during the install process. If you see something like the below, you're in business; this is the Nagios web reporting and administration panel.The Nagios administration area's front page Default setup
To start with, click the Services link in the left menu. You should see something like the below, which is the monitoring for
localhost
and the service monitoring that the packager set up for you by default:Default Nagios monitoring hosts and services
Note that on my system, monitoring for the already-existing HTTP and SSH daemons was automatically set up for me, along with the default checks for load average, user count, and process count. If any of these pass a threshold, they'll turn yellow for WARNING, and red for CRITICAL states.
This is already somewhat useful, though a server monitoring itself is a bit problematic because of course it won't be able to tell you if it goes completely down. So for the next step, we're going to set up monitoring for the remote host
Default configurationwww.example.com
, which means firing up your favourite text editor to edit a few configuration files.Nagios configuration is at first blush a bit complex, because monitoring setups need to be quite finely-tuned in order to be useful long term, particularly if you're managing a large number of hosts. Take a look at the files in
/etc/nagios3/conf.d
.# ls /etc/nagios3/conf.d contacts_nagios2.cfg extinfo_nagios2.cfg generic-host_nagios2.cfg generic-service_nagios2.cfg hostgroups_nagios2.cfg localhost_nagios2.cfg services_nagios2.cfg timeperiods_nagios2.cfgYou can actually arrange a Nagios configuration any way you like, including one big well-ordered file, but it makes some sense to break it up into sections if you can. In this case, the default setup includes the following files:
contacts_nagios2.cfg
defines the people and groups of people who should receive notifications and alerts when Nagios detects problems or resolutions.extinfo_nagios2.cfg
makes some miscellaneous enhancements to other configurations, kept in a separate file for clarity.generic-host_nagios2.cfg
is Debian's host template, defining a few common variables that you're likely to want for most hosts, saving you repeating yourself when defining host definitions.generic-service_nagios2.cfg
is the same idea, but it's a template service to monitor.hostgroups_nagios2.cfg
defines groups of hosts in case it's valuable for you to monitor individual groups of hosts, which the Nagios admin allows you to do.localhost_nagios2.cfg
is where the monitoring for thelocalhost
host we were just looking at is defined.services_nagios2.cfg
is where further services are defined that might be applied to groups.timeperiods_nagios2.cfg
defines periods of time for monitoring services; for example, you might want to get paged if a webserver dies 24/7, but you might not care as much about 5% packet loss on some international link at 2am on Saturday morning.This isn't my favourite method of organising Nagios configuration, but it'll work fine for us. We'll start by defining a remote host, and add services to it.
Testing servicesFirst of all, let's check we actually have connectivity to the host we're monitoring from this server for both of the services we intend to check; ICMP ECHO (PING) and HTTP.
$ ping -n -c 1 www.example.com PING www.example.com (192.0.43.10) 56(84) bytes of data. 64 bytes from 192.0.43.10: icmp_req=1 ttl=243 time=168 ms --- www.example.com ping statistics --- 1 packets transmitted, 1 received, 0% packet loss, time 0ms rtt min/avg/max/mdev = 168.700/168.700/168.700/0.000 ms $ wget www.example.com -O - | grep -i found tom@novus:~$ wget www.example.com -O - --2012-01-26 21:12:00-- http://www.example.com/ Resolving www.example.com... 192.0.43.10, 2001:500:88:200::10 Connecting to www.example.com|192.0.43.10|:80... connected. HTTP request sent, awaiting response... 302 Found ...All looks well, so we'll go ahead and add the host and its services.
Defining the remote hostWrite a new file in the
/etc/nagios3/conf.d
directory calledwww.example.com_nagios2.cfg
, with the following contents:define host { use generic-host host_name www.example.com address www.example.com }The first stanza of
localhost_nagios2.conf
looks very similar to this, indeed, it uses the same host template,generic-host
. All we need to do is define what to call the host, and where to find it.However, in order to get it monitoring appropriate services, we might need to add it to one of the already existing groups. Open up
hostgroups_nagios2.cfg
, and look for the stanza that includeshostgroup_name http-servers
. Addwww.example.com
to the group's members, so that that stanza looks like this:# A list of your web servers define hostgroup { hostgroup_name http-servers alias HTTP servers members localhost,www.example.com }With this done, you need to restart the Nagios process:
# service nagios3 restartIf that succeeds, you should notice under your Hosts and Services section is a new host called "www.example.com", and it's being monitored for HTTP. At first, it'll be PENDING, but when the scheduled check runs, it should come back (hopefully!) as OK.
This document describes how to install Nagios Core from source.
This guide is broken up into several sections and covers different operating system (OS) distributions. If your OS Distribution is not included in this guide then please contact us to see if we can get it added. Some distributions may be missing as we don't have access to a test environment that allows us to develop the documentation.
Nagios Core 4.3.2 and Nagios Plugins 2.2.1 is what this guide instructs you to install, however future versions should also work fine with these steps.
This documentation is broken up into two distinct sections:
- Install Nagios Core
- Install Nagios Plugins
This separation is to make a clear distinction as to what prerequisite packages are required by the OS it is being installed on. For example the SNMP packages are installed as part of the Nagios Plugins section, as SNMP is not required by Nagios Core.
Please select your OS:
- Red Hat Enterprise Linux (RHEL)
- CentOS
- Oracle Linux
- Ubuntu
- SUSE SLES | openSUSE Leap
- Debian
- Raspbian
- Fedora
- Arch Linux
- Gentoo
- FreeBSD
- Solaris
- Apple OS X
CentOS | RHEL | Oracle Linux
This guide is based on SELinux being disabled or in permissive mode. Steps to do this are as follows.
sed -i 's/SELINUX=.*/SELINUX=disabled/g' /etc/selinux/config setenforce 0Perform these steps to install the pre-requisite packages.
yum install -y gcc glibc glibc-common wget unzip httpd php gd gd-develcd /tmp wget -O nagioscore.tar.gz https://github.com/NagiosEnterprises/nagioscore/archive/nagios-4.3.2.tar.gz tar xzf nagioscore.tar.gzCompile
cd /tmp/nagioscore-nagios-4.3.2/ ./configure make allCreate User And Group
This creates the nagios user and group. The apache user is also added to the nagios group.
useradd nagios usermod -a -G nagios apacheInstall Binaries
This step installs the binary files, CGIs, and HTML files.
make installInstall Service / Daemon
This installs the service or daemon files and also configures them to start on boot. The Apache httpd service is also configured at this point.
===== CentOS 5.x / 6.x | RHEL 5.x / 6.x | Oracle Linux 5.x / 6.x =====
make install-init chkconfig --add nagios chkconfig --level 2345 httpd on===== CentOS 7.x | RHEL 7.x | Oracle Linux 7.x =====
make install-init systemctl enable nagios.service systemctl enable httpd.serviceInformation on starting and stopping services will be explained further on.
Install Command Mode
This installs and configures the external command file.
make install-commandmodeInstall Configuration Files
This installs the *SAMPLE* configuration files. These are required as Nagios needs some configuration files to allow it to start.
make install-configInstall Apache Config Files
This installs the Apache web server configuration files. Also configure Apache settings if required.
make install-webconfConfigure Firewall
You need to allow port 80 inbound traffic on the local firewall so you can reach the Nagios Core web interface.
===== CentOS 5.x / 6.x | RHEL 5.x / 6.x | Oracle Linux 5.x / 6.x =====
iptables -I INPUT -p tcp --destination-port 80 -j ACCEPT service iptables save ip6tables -I INPUT -p tcp --destination-port 80 -j ACCEPT service ip6tables save===== CentOS 7.x | RHEL 7.x | Oracle Linux 7.x =====
firewall-cmd --zone=public --add-port=80/tcp firewall-cmd --zone=public --add-port=80/tcp --permanentCreate nagiosadmin User Account
You'll need to create an Apache user account to be able to log into Nagios.
The following command will create a user account called nagiosadmin and you will be prompted to provide a password for the account.
htpasswd -c /usr/local/nagios/etc/htpasswd.users nagiosadminWhen adding additional users in the future, you need to remove -c from the above command otherwise it will replace the existing nagiosadmin user (and any other users you may have added).
Start Apache Web Server
===== CentOS 5.x / 6.x | RHEL 5.x / 6.x | Oracle Linux 5.x / 6.x =====
service httpd start===== CentOS 7.x | RHEL 7.x | Oracle Linux 7.x =====
systemctl start httpd.serviceStart Service / Daemon
This command starts Nagios Core.
===== CentOS 5.x / 6.x | RHEL 5.x / 6.x | Oracle Linux 5.x / 6.x =====
service nagios start===== CentOS 7.x | RHEL 7.x | Oracle Linux 7.x =====
systemctl start nagios.serviceTest Nagios
Nagios is now running, to confirm this you need to log into the Nagios Web Interface.
Point your web browser to the ip address or FQDN of your Nagios Core server, for example:
http://10.25.5.143/nagios
http://core-013.domain.local/nagios
You will be prompted for a username and password. The username is nagiosadmin (you created it in a previous step) and the password is what you provided earlier.
Once you have logged in you are presented with the Nagios interface. Congratulations you have installed Nagios Core.
BUT WAIT ...
Currently you have only installed the Nagios Core engine. You'll notice some errors under the hosts and services along the lines of:
(No output on stdout) stderr: execvp(/usr/local/nagios/libexec/check_load, ...) failed. errno is 2: No such file or directoryThese errors will be resolved once you install the Nagios Plugins, which is covered in the next step.
Installing The Nagios Plugins
Nagios Core needs plugins to operate properly. The following steps will walk you through installing Nagios Plugins.
These steps install nagios-plugins 2.2.1. Newer versions will become available in the future and you can use those in the following installation steps. Please see the releases page on GitHub for all available versions.
Please note that the following steps install most of the plugins that come in the Nagios Plugins package. However there are some plugins that require other libraries which are not included in those instructions. Please refer to the following KB article for detailed installation instructions:
Unixmen
15 Oct 2013 | Nagios
Nagios Core 4.0.1 has been released and can be downloaded from the downloads page. The Changelog can be found here. This release contains bug fixes and updates related to file downtime scheduling, updates to the RPM spec file and more. Thanks to all who contributed to this release.
RootdevDon't get me wrong, I like Nagios. I think it's an excellent piece of software and I have spent many years working with it, but I have just completed a proof of concept and gained approval to deploy OpenNMS as a new Enterprise Grade Network Monitoring System. And the main system targeted for replacement here? That's right, it's Nagios, which is primarily running via the remote plugin model, using the NRPE daemon to run scripts on remote hosts and report back to base.
Now anyone who has ever played with Nagios will know that it can be a beast of a thing to set up and get working satisfactorily. In fact, most places will devote a good year or so to the process. As a newbie, sitting in front of a freshly installed Nagios instance and wondering how to get it to do something can be an extremely disheartening experience. Once it's up and running though it's usually fairly low maintenance to keep it going, and not too difficult to add new devices or custom plugins as you go along. And, for the most part, it is good at what it does, so why would you want to replace it?
Well, despite having almost unparalleled abilities to monitor at the application level and perform any manner of esoteric checks, Nagios does have its limitations.
A Question of Scale
One of the biggest problems I have encountered with various Nagios implementations is one of scale. Put simply, Nagios does not scale well.
Too Much Information, Too Little Visibility
I have seen Nagios implementations monitoring hundreds or even thousands of hosts and services where the corresponding Host Detail and Service Detail screens are simply so big that they refresh themselves before you can scroll even half way down the page.
The Tactical Overview page gives you a simple view into the number of current issues, but doesn't tell you at a glance what or where they are.
This makes using it in a NOC something of a chore as you actually have to interact with it to get at the information you require. It also has fairly poor visibility into historical data, although this can be addressed to some extent using additional plugins such as
perfparse
, and it has little to no reporting output – both of which are things The Business tend to like rather a lot.Timeout, I Tripped Myself Up
Running custom plugins to fit any ad-hoc monitoring requirement might seem like a good idea as you have total control over the requirements and the output, and for what it's worth, I like writing Nagios plugins, I've written them for the NRPE daemon as well as for places where the plugins are installed locally and run over ssh.
In both instances I have seen occasions where the amount of time taken to do a single poll run can take longer than the amount of time taken to gather the results of that poll, and have seen systems come crashing to their knees as a result.
Please Invent Me a Wheel
From my experience this is probably the most misunderstood issue with Nagios; people will spend a long time writing all manner of shell scripts or Perl scripts to plug in to Nagios to return all manner of incredibly useful data, which is all well and good, but most of that information is available already, at significantly less cost (both computational and time), from SNMP.
Yes, Nagios is perfectly capable of polling SNMP, it's just that I've not yet come across anyone who was using it that way by default, and once you have the system set up with dozens or even hundreds of plugins, making the choice to convert to SNMP would be an administrative nightmare.
OpenNMS
Nagios® is a system and network monitoring application. It watches hosts and services that you specify, alerting you when things go bad and when they get better. OpenNMS has a fundamentally different architecture and set of goals compared to Nagios. The official Nagios Core 3.x Documentation says:Note: Nagios is not designed to be a replacement for a full-blown SNMP management application like HP OpenView or OpenNMS.Like OpenNMS, Nagios is Free / Libre / Open Source Software. Nagios Enterprises sponsors and provides support for Nagios.
April 14, 2008 | LinkedIn
I'm presently in an environment where we use both SiteScope and Nagios. This is not because there is anything that one of them does that the other can't technically also do. It is simply an outgrowth of having two different teams with a vested interest in monitoring and alerting with totally different backgrounds and approaches. In our case it is a NOC that wanted a Windows-based SiteScope server and a Systems Operations group that already had a number of Linux-based Nagios servers.
I've dealt with both, though I'm considerably familiar with Nagios. We made a few attempts to migrate Nagios checks into SiteScope but found it somewhat awkward to make happen. Mostly this revolved around the fact that custom scripts for generating checks as new services were discovered depended on a number of tools that did not easily exist on the Windows platform (though Cygwin did eliminate some of that difficulty). Since SiteScope can be implemented on a Unix platform though, this could easily be a non-issue for other environments.
Even if we had been able to seamless migrate such scripts however, there doesn't seem to be much documentation on modifying SiteScope's configuration in a scripted way. I expect that had we been able to reach this hurdle we may well have stumbled.
I think this last point though is why I prefer Nagios to SiteScope. If I want to get fancy, I can dig into Nagios for myself and come up with a solution. With SiteScope, you're fine if you're doing something with the canned checks or features, but anything else gets cumbersome.
SiteScope does score over Nagios on the availability of well supported checks and solution sets. Nagios may well be able to perform the same checks with a plug-in, but the quality and supportability of Nagios plug-ins is somewhat hit or miss.
Then of course there is the price tag. Nagios is free, whereas Mercury charges (and quite a bit if you have a need for lots of monitoring points). Professional Nagios consulting and support is available though, so it doens't have to be a roll-your own sort of thing if you *are* willing to spend some money.
Because of its open nature, Nagios tends to integrate well with almost anything. The most recent releases have added a new feature called NDO (Nagios Data Out) which puts virtually anything you could want to know about your Nagios environment into an SQL database, including check results and configuration. This has facilitated quite a few interesting add-ons, including one called NagVis that allows graphical views to be created in almost any organizational structure you desire.
Mercury seems to stress the fact that SiteScope is 'agentless'. Personally, I feel this is misleading. It is true that you don't have to run an agent on clients to collect data and serve it up to your monitoring server. However it also means that most of the sorts of checks that you'd normally expect to be handled by an agent are instead handled by SiteScope logging into the box via SSH and running commands or scripts local to the box. In effect, sshd and a set of scripts become your agents. Given the choice, I prefer the well managed configuration of NRPE (Nagios Remote Plug-in Execution), which effectively allows the same capability without having to create a privileged service account for your SiteScope service (to my thinking, a huge security risk).
In summary, Nagios is not for the faint of heart. Its configuration is complex, but infinitely more manageable than that of SiteScope. With thought and planning, you can do some amazing things with it. The mixed blessing is that you can make custom solutions for almost anything. The downside to which is that you may have to do so if you can't find an existing solution.
SiteScope is actually quite nice, however. If your needs are well defined and unlikely to change much, then SiteScope could potentially be a smoother fit. It costs a good bit, but saves you time by working out of the box.
I use both currently. The big difference is nagios is infinitely more scriptable- you can easily create a combination of perl + snmp + nmap to generate the configs.
One thing sitescope has going for it is VuGen (which I personally dislike) to write site walk-through scripts. If you have a site that uses heavy javascript, the nagios alternatives (twill, webinject, etc) don't get the job done. On the other hand, VuGen only works on the windows-version of sitescope.
Another hit against sitescope is that if you have many checks, it misbehaves when doing confirmations and can cause backups and other grief. It also likes to lose it's brains and start throwing errors for no reason- where I work they ended up setting sitescope to restart once a day because around the 36 hour mark it started randomly failing checks until it was restarted.
I'm obviously biased towards Nagios, but I think it's also possible/feasible to use sitescope only for VuGen one-offs/walkthroughs and Nagios for everything else.
Paul S.The great thing about Nagios is that it is a task scheduler and reporting tool. We use it to monitor all sorts of things including the usual disk-space and memory however with a bit of code (python in our case!) we've created a plugin to check the currently running version of Debian and alert us if a "dist-upgrade" is required.
If you can provide us with the data for anything that you want monitored (and that includes business processes such as number of sales calls made per day!), we can monitor it in Nagios and configure it to send you alerts.
Nagios is so much more than a Network Monitoring System - It rocks!
I have used Nagios and its predecessor Netsaint. Having the option to configure whatever check you might need - and yes, quite a few have been made to tailor the needs of the companies - there is little what I see as a con.
Configuration from ground up needs to be done no matter what package you select. However using template, it can be done - yes, requiring typing.Dependencies, escalation models, host and service information, it's all in there.
On the other hand, when others need to configure Nagios also, it is handy to have the tooling available to 'click' your config in. Running Nagios with slave monitoring and failover, including the clickable config = opsview
Link included below - nagiosexchange already has been mentioned.
Links:
Jan 10, 2008 | IT Resource Center forums
Marc,
I can only give you my opinion on SiteScope based on its comparison to Internet Services as I don't know anything about Nagios.
SiteScope is sharp and easy to use, install, ect. I would recommend using SiteScope for Website monitoring as well as other network testing I.E. DNS, DHCP etc. With that said, as for the agentless server monitoring, I've only set up 1 Unix and 1 Windows monitor. The Unix is very easy to do, and works well. As for the Windows side, well that was another story. In our environment NetBios is not acceptable, so I had to go with SSH. Which involves installing and setting up OpenSSH. After I learned how to use this product it was fairly easy to set up monitoring of CPU, Memory, and I am monitoring 1 application on a Windows Server. BUT, I couldn't imagine training all of our SA's to install OpenSSH on 500+ Windows servers, so the jury is still out on that one.
I recommend you read the specs on SiteScope and determine if it is the type of monitoring you are looking for. SiteScope does many different things which about only 10 of them will be used in our environment.
One other thing worth mentioning is web transactions.... They have a URL sequence tool available, but I've it difficult to use in situations where you have to add products to a cart and then attempt to check out. If you want to do those type of transactions, I recommend using VuGen and then buying the license to allow SiteScope to be able to read the type of scripts VuGen creates. Unless you want to also purchase BPM which Vugen and BPM are designed for each other. Confusing to say the least. But the URL sequencer is good if you want to test various hyperlinks or even forms to navigate through a sequence of webpages simulating what a user might do.
Anyhow that's all I can think of at this moment. Hope it is usefull.
May 15, 2006 | ReadList.com
Keller, Steve
Hi,
We have a large SiteScope installation (we monitor several thousand
servers with about 20 SiteScope machines) and are considering Nagios as a replacement. One of the issues we have is that we cannot, for various (mostly political) reasons, install an agent on the hosts we monitor. We have successfully tested agentless monitoring by using Perl plugins that SSH to the remote host and run a command, parsing the output. However, we cannot run as many monitors per server as in our SiteScope installation because of the latency incurred by setting up and breaking down the SSH connection for EACH monitor.So my question is, does anyone know of a reasonable approach to agentless monitoring using Nagios? We are planning to try SSH4, which automatically keeps SSH connections open for later use, and forcing Nagios to run several scripts sequentlially against the same host. But this has the problem that a host which is down, or busy, could delay checking other hosts.
We would like, for many reasons (not just $$$, although that's a factor), to use Nagios, but having to install twice or three times as many servers to support it is out of the question. Any advice gratefully accepted.
Thanks,
Steve Keller (skeller)
Manager, Tools and ESM Group
Global Infrastructure ServicesJason Martin
On Mon, May 15, 2006 at 03:34:53PM -0700, Keller, Steve wrote:
> a replacement. One of the issues we have is that we cannot, for various (mostly political) reasons, install an agent on the hosts we monitor.By any chance is SNMP already in place on the target hosts? That might cut down on the number of agent-on-demand checks you have to run.
> So my question is, does anyone know of a reasonable approach to
> agentless monitoring using Nagios? We are planning to try SSH4, whichCheck out FSH (http://freshmeat.net/projects/fsh/) which does a similiar thing. That should take care of the SSH overhead problem.
> this has the problem that a host which is down, or busy, could delay
> checking other hosts.If you are writing this in perl, you can have your plugins timeout gracefully to prevent that. Nagios can be configured to time out plugins as well.
-Jason Martin
--
"Keyboard? How quaint!" - Scotty
This message is PGP/MIME signed.Eli Stair
Bringing up SNMP is a valid point (I'm currently handling ~25% of my active service checks this way). However there are a number of scenarios where the load on both the server/network/client is significantly greater to pull down a tree that needs processing (process table for instance), as the impact on the client is fairly large to process its own /proc entries to generate the values, pull them down sequentially, and parse the tree on the server. Same deal with a variety of network tables... Even Cisco/Foundry/etc do a horrible job of having their devices process & generate ARP tables, etc.
Even executing a remote command over SSH with the crypto overhead is faster in most situations (for me), and actually consumes less cycles on BOTH ends... This FSH project looks promising, though hasn't been updated since 2001... A scary prospect for anything that is crypto/authentication based :)
I don't see any reason we couldn't whip up an active check script that runs a number of commands sequentially over the SSH session that's set up at the beginning, applies the results as separate passive service checks. That's the only way I can think of to handle it, since each service check will otherwise be initiating a separate connection, at whatever rate is determined by its schedule in the queue.
Then again, check_by_fsh sounds nice too! Have to look at SSH4 features now that you mentioned it Steve.
Just my thoughts.
/eli
John P. Rouillard
In message <C08F7254.27ECE%estair>,
Eli Stair writes:
>Even executing a remote command over SSH with the crypto overhead is faster
>in most situations (for me), and actually consumes less cycles on BOTH
>ends... This FSH project looks promising, though hasn't been updated since
>2001... A scary prospect for anything that is crypto/authentication based :)Actually fsh is just a wrapper over ssh/rsh so it doesn't have any security implications on it's own. It shares the security of the
underlying transport.>I don't see any reason we couldn't whip up an active check script that runs
>a number of commands sequentially over the SSH session that's set up at the
>beginning, applies the results as separate passive service checks. That's
>the only way I can think of to handle it, since each service check will
>otherwise be initiating a separate connection, at whatever rate is
>determined by its schedule in the queue.check_by_ssh can run multiple commands in one shot and report each output line to the proper service. See the -s flag and it's use with multiple -C commands.
>Then again, check_by_fsh sounds nice too! Have to look at SSH4
>features now that you mentioned it Steve.One problem is that you have to keep a master ssh connection permanently open and manage the connection if you aren't using
fsh. For a lot of hosts (1000+), this could put a resource strain on the server as ports are taken up and 1000 ssh permanent ssh process are created.One thing that would also be nice for check_by_ssh would be the ability to use an ssh_agent for the keys. Sadly the current
check_by_ssh sanitizes the environment a bit too well and removed the environment variables used to allow ssh to communicate with it's agent.-- rouilj
John Rouillard
check_hpasm is a plugin for Nagios which checks the hardware health of Hewlett-Packard Proliant servers. To accomplish this, you must have installed the hpasm package. The plugin checks the health of processors, power supplies, memory modules, fans, CPU- and board-temperatures, and alerts you if one of these components is faulty or operates outside its normal parameters.
check_openmanage is a plugin for Nagios that checks the hardware health of Dell servers running OpenManage Server Administrator (OMSA). The plugin can be used remotely with SNMP or locally with NRPE, check_by_ssh, or similar. It checks the health of the storage subsystem, power supplies, memory modules, temperature probes, etc., and gives an alert if any of the components are faulty or operate outside normal parameters.
TechSoup
Just wondering if anyone has ant opinion about Zenoss vs Nagios. I have helped people set up one or the other, but I have not used either of them over time.
Zenoss looks like it has a nicer user interface, but it was less intuitive for how it was set up. (I used the NagioSql tool for administering much of Nagios, and that made the setup much nicer)
freshmeat.net
check_mk is a general purpose Nagios-plugin for retrieving data. It adopts a new approach for collecting data from operating systems and network components. It obsoletes NRPE, check_by_ssh, NSClient, and check_snmp and it has many benefits, the most important of which are significant reduction of CPU usage on the Nagios host and automatic inventory of items to be checked on hosts. The larger your Nagios installation is, the more helpful these improvements.
This is the second part of the two part series by Wojciech Kocjan in which we have made an effort to cover everything in notifications and events in Nagios 3.0.The first part covered:
In this article, we will cover the following sub-topics:
- External Commands
- Event Handlers
- Modifying Notifications
- Adaptive Monitoring
External Commands
Nagios offers a very powerful mechanism for receiving events and commands from external applications-the external commands pipe. This is a pipe file created on a file system that Nagios uses to receive incoming messages. The name of the file is rw/nagios.cmd and it is located in the directory passed as the localstatedir option during compilation. Following the compilation and installation instructions and the given guidelines, the file name will be /var/nagios/rw/ nagios.cmd.
The communication does not use any authentication or authorization-the only requirement is to have write access to the pipe file. An external command file is usually writable by the owner and the group; the usual group used is nagioscmd. If you want a user to be able to send commands to the Nagios daemon, simply add that user to this group.
A small limitation of the command pipe is that there is no way to get any results back and so it is not possible to send any query commands to Nagios. Therefore, by just using the command pipe, you have no verification that the command you have just passed to Nagios has actually been processed, or will be processed soon. It is, however, possible to read the Nagios log file and check if it indicates that the command has been parsed correctly, if necessary.
An external command pipe is used by the web interface to control how Nagios works. The web interface does not use any other means to send commands or apply changes to Nagios. This gives a good understanding of what can be done with the external command pipe interface.
From the Nagios daemon perspective, there is no clear distinction as to who can perform what operations. Therefore, if you plan to use the external command pipe to allow users to submit commands remotely, you need to make sure that the authorization is in place as well so that it is not possible for unauthorized users to send potentially dangerous commands to Nagios.
The syntax for formatting commands is easy. Each command must be placed on a single line and end with a newline character.
Troubleshooting Nagios 3.0Tuesday, April 7, 2009 | Linux Servers
In this article by Wojciech Kocjan, we will learn about troubleshooting Nagios 3.0 which includes troubleshooting the web interface, passive checks, SSH-Based checks, and NRPE.The article includes various possible errors along with their solutions and detailed explanations for each error listed out.
See MoreNotifications and Events in Nagios 3.0- part2 Tuesday, May 12, 2009 | Linux Servers
This is the second part of the two part series by Wojciech Kocjan in which we have made an effort to cover everything in notifications and events in Nagios 3.0.The first part covered:
In this article, we will cover the following sub-topics:See More
- External Commands
- Event Handlers
- Modifying Notifications
- Adaptive Monitoring
Learning Nagios 3.0 Table of Contents Monday, October 27, 2008 | All
Notifications and Events in Nagios 3.0-part1 Monday, May 11, 2009 | Linux Servers
See More
This is a 2-part series by Wojciech Kocjan. We have made an attempt to cover all about events and notifications in Nagios 3.0 in detail in this series. The following sub-topics will be covered as a part of this series:
See More
- Effective Notifications
- Escalations
- External Commands
- Event Handlers
- Modifying Notifications
- Adaptive Monitoring
Troubleshooting Nagios 3.0 Tuesday, April 7, 2009 | WordPress
In this article by Wojciech Kocjan, we will learn about troubleshooting Nagios 3.0
See MoreTroubleshooting Nagios 3.0 Tuesday, April 7, 2009 | Linux Servers
In this article by Wojciech Kocjan, we will learn about troubleshooting Nagios 3.0 which includes troubleshooting the web interface, passive checks, SSH-Based checks, and NRPE.The article includes various possible errors along with their solutions and detailed explanations for each error listed out.
See More Passive Checks and NSCA (Nagios Service Check Acceptor) Wednesday, November 19, 2008 | Networking & Telephony
Nagios is a very powerful platform because it is easy to extend. A great feature that Nagios offers is the ability for third-party software or other Nagios instances to report information on the status of services or hosts. This way, Nagios does not need to schedule and run checks by itself, but other applications can report information as it is available to them. This means that your applications can send problem reports directly to Nagios, instead of just logging them. In this way, your applications can benefit from powerful notification systems as well as dependency tracking. In this article by Wojciech Kocjan, we will see how this mechanism can also be used to receive failure notifications from other services or machines-for example, SNMP traps.
See More
September 11th, 2008 | Linux Journal
A while back, I wrote an article for Linux Journal's web edition entitled "Howto be a good (and lazy) System Administrator." A couple astute readers, after reading the article, asked if I was familiar with the Nagios monitoring system, and I am. I've been using Nagios for a few years now.
I had intended to write this article as a How-to on getting Nagios configured and running for the first time. However, it turns out that the documentation that comes with Nagios is really pretty good. And even if you do have problems, and I did, the user community is also quite responsive. So, rather than beating a dead horse, (with sympathy to horse lovers) I decided to continue the Good and Lazy Administrator Theme and discuss extending Nagios with custom service checks and custom notifications.
Nagios uses a plug-in mechanism to implement all of it's server and service checks as well as all of it's notifications. This is good news for hackers, as it allows us to build new functionality that either no one else has though of, or has need of. I wrote a couple scripts for my Nagios system. One does a custom service check to see if I have voicemail waiting for me at the Help Desk, and the other does a custom notification by telephone. Before I go on, I should give a little bit of background.
I maintain several servers, both for myself and for customers. These servers range from web servers to phone systems running Asterisk. Just like most System Administrators, I don't have any "optional" servers or services; the stuff just has to work, and when it isn't, I need to know. But I'll tell you, I'm not interested in sitting at the desk watching the Help Desk phone or the monitoring screen. I'm either too lazy, or too busy. Either way, that's what silicon is for, right?
My phone system at the house runs on Asterisk. You can read more about my home infrastructure at http://www.linuxjournal.com/article/9111. My Nagios server runs on the same server, so it just makes sense to integrate the two services.
I've created a Nagios script that monitors the Help Desk voicemailbox and sets a service alert if there are any critical alerts in Nagios. I've also written a script that can call me, perhaps on my cellphone, in the event of a service outage. With these two scripts in place, I can get a call on my cellphone any time someone calls my Help Desk and leaves a message. I can also get a call if any of my monitored services fail. Theoretically, I can be at a park playing with my boys and know that my servers are happy... until the cellphone rings.
I understand that I have kind of a unique situation, but the same concept is applicable in a business production environment, so lets get down to looking at code.
First, let's talk about the Help Desk monitoring script. Essentially, this script checks to see if there are any files in the INBOX in the Help Desk mailbox. Here is the code:
#!/usr/bin/perl -w local *DIR; my ($file, $error); $error = 0; opendir DIR, "/var/spool/asterisk/voicemail/customers/611/INBOX/" or die("Error: Permission denied\n"); while ($file = readdir(DIR)) { if ($file eq ".") { next; } if ($file eq "..") { next; } $error++; } $error = $error/4; if (!$error) { print "OK\n"; exit 0; } else { print "CRITICAL: $error\n"; } exit 2;Of course, you need to make sure that the Nagios user has access to the Asterisk voicemailbox, but that can be taken care of by setting the script set-uid. The script, as you can see, is pretty simple. If there are any other files in the directory, the script assumes that there is a voicemail and sends a CRITICAL alert to Nagios. Otherwise everything is OK.
To enable Nagios to use this check script, we need to define it in checkcommands.cfg. Here is the definition, I used:
define command{ command_name check_help command_line /etc/nagios/local/check_611.pl }Now, I can refer to the check_help check script in the services.cfg file. Here's how I did it:
define service { use generic-service name Help_Desk host_name my_server service_description Help Desk Voicemail check_command check_help register 1 }With this configuration in place, Nagios can indicate an alarm any time there is voicemail in the Help Desk mailbox. But that's only half of what I promised to write about. The next script allows Nagios to call me to let me know that I've got a fire to put out. Here is that script:
#!/usr/bin/perl foreach $main::phone ("15055551234") { $main::call = <MaxRetries: 0 RetryTime: 1 WaitTime: 120 Account: Enterprise Context: apps Extension: OUTAGE Priority: 1 EOF ; open FILE, ">/tmp/outage.call"; print FILE $main::call; close FILE; system("mv /tmp/outage.call /var/spool/asterisk/outgoing"); } As you can see, this script isn't complicated, either. It simply creates an Asterisk "call file" and puts it in Asterisk's outgoing spool directory. The script is capable of calling multiple numbers... just in case. It's important to that the call file be created in another directory and moved into the spool directory. Otherwise bad things can happen if Asterisk tries to read the file while the script is still writing it.
Obviously this script relies on some configuration in the Asterisk dial plan. Here is the relevant part of the dial plan:
exten => OUTAGE,1,answer exten => OUTAGE,2,playback(/etc/asterisk/sounds/OUTAGE) exten => OUTAGE,3,hangupAt this point, you're probably realizing that I'm not doing anything complicated. All that is needed from Asterisk's point of view is an audio message in /etc/asterisk/sounds/OUTAGE (.wav or .au) that indicates that something is on fire. Asterisk will select the most reasonable file extension and play the file when the call is answered.
So all that is left to do is configure Nagios to use this notification method. This is configured in the misccommands.cfg file. Here is how I did it:
# 'notify_by_phone' command definition define command{ command_name notify_by_phone command_line /etc/nagios/local/notify_by_phone.pl }Now that all of the configuration is done, we restart Nagios and reload the Asterisk dial plan. To do this, we type "/etc/init.d/nagios restart" at the command line and "extensions reload" at the Asterisk console.
So now, anytime I have voicemail at the Help Desk, it's indicated in the Nagios monitoring screen as a critical alert. Also, anytime any of my servers or services are unavailable, I can get a phone call on either my home phone or my cell phone. This means that my customers don't HAVE to have those phone numbers and I can still provide quality service to them.
Now I realize that I have a unique situation, but I hope that this article serves as an example of how to create custom Nagios service checks and notifications, as well as hinting at some of the integration options available in Asterisk.
__________________________
Mike Diehl is a freelance Computer Nerd specializing in Linux administration, programing, and VoIP. Mike lives in Albuquerque, NM. with his wife and 3 sons. He can be reached at [email protected]
check_by_telnet
checks cpu, memory, hard-disk by telnet connection
This is the second article in a two-part series that looks at a hands-on approach to monitoring a data center using the open source tools Ganglia and Nagios. In Part 2, learn how to install and configure Nagios, the popular open source computer system and network monitoring application software that watches hosts and services, alerting users when things go wrong. The article also shows you how to unite Nagios with Ganglia (from Part 1) and add two other features to Nagios for standard clusters, grids, and clouds to help with monitoring network switches and the resource manager.Data centers are growing and administrative staffs are shrinking, necessitating efficient monitoring tools for compute resources. Part 1 of this series discussed the benefits of using Ganglia and Nagios together, then showed you how to install and extend Ganglia with homemade monitoring scripts.
Recall from Part 1 the multiple definitions of monitoring (depending on the implier and the inferrer):
- If you're running applications on the cluster, you think: "When will my job run? When will it be done? And how is it performing compared to last time?"
- If you're the operator in the network operations center, you think: "When will we see a red light that means something needs to be fixed and a service call placed?"
- If you're in the systems engineering group, you think: "How are our machines performing? Are all the services functioning correctly? What trends do we see, and how can we better utilize our compute resources?"
You can find code to monitor exactly what you want to monitor and that code can be of the open source variety. The most difficult part of using open source monitoring tools comes when you attempt to implement an install and puzzle out a configuration that works well for your environment. Two major problems with open source (and commercial) monitoring tools are the following:
- No tool will monitor everything you want the way you want it.
- Much customization could be required to get the tool working in your data center exactly how you want it.
Ganglia is a tool that monitors data centers and is used heavily in high-performance computing environments (but it's attractive for other environments too like clouds, render farms, and hosting centers). It is more concerned with gathering metrics and tracking them over time compared with Nagios's focus as an alerting mechanism. Ganglia used to require an agent to run on every host to gather information from it, but now metrics can be obtained from just about anything through Ganglia's spoofing mechanism. Ganglia doesn't have a built-in notification system, but it was designed to support scalable built-in agents on target hosts.
After reading Part 1, you could install Ganglia, as well as answer the monitoring questions that different user groups tend to ask. You could also configure the basic Ganglia setup, use the Python modules to extend functionality with IPMI (the Intelligent Platform Management Interface), and use Ganglia host spoofing to monitor IPMI.
Now, let's look at Nagios.
Introducing Nagios
This part shows you how to install Nagios and tie Ganglia back into it. We're going to add two features to Nagios that'll help your monitoring efforts in standard clusters, grids, clouds (or whatever your favorite buzzword is for scale-out computing). The two features are all about:
- Monitoring network switches
- Monitoring the resource manager
In this case, we'll be monitoring TORQUE. When we are finished, you'll have a framework to control the monitoring system of your entire data center.
Nagios, like Ganglia, is used heavily in HPC and other environments, but Nagios is more of an alerting mechanism that Ganglia (which is more focused on gathering and tracking metrics). Nagios previously only polled information from its target hosts, but has recently developed plug-ins that allow it to run agents on those hosts. Nagios has a built-in notification system.
Now let's install Nagios and set up a baseline monitoring system of an HPC Linux® cluster to address the three different monitoring perspectives:
- The application person can see how full the queues are and see available nodes for running jobs.
- The NOC can be alerted of system failures or see a shiny red error light on the Nagios Web interface. They also get notified via email if nodes go down or temperatures get too high.
- The system engineer can graph data, report on cluster utilization, and make decisions on future hardware acquisitions.
check_openmanage is a plugin for Nagios that checks the hardware health of Dell PowerEdge and PowerVault servers. It uses the Dell OpenManage Server Administrator (OMSA) software to accomplish this task. check_openmanage can be used remotely with SNMP or locally with NRPE. The plugin checks the health of the storage subsystem, power supplies, memory modules, temperature probes, etc., and gives an alert if any of the components are faulty or operate outside normal parameters.
Changes: The --global option was added, which turns on checking of everything. If used with SNMP, the global system health status is also probed, to protect the user against bugs in the... plugin. If used with omreport, the overall chassis health is used. Support for SNMP version 3 was added. Checking of esmhealth was added, which checks the overall health of the ESM log, i.e. the fill grade. Alert log reporting was fixed to use the same format as for the ESM log. Output messages are now sorted by severity. Minor changes were made in how out-of-date controller firmware/driver is reported
(less)
Nagiosgraph is an add-on for Nagios. It collects service perfdata in RRD format, and displays the resulting graphs via CGI.
September 11, 2008 Linux Journal
I've created a Nagios script that monitors the Help Desk voicemailbox and sets a service alert if there are any critical alerts in Nagios. I've also written a script that can call me, perhaps on my cellphone, in the event of a service outage. With these two scripts in place, I can get a call on my cellphone any time someone calls my Help Desk and leaves a message. I can also get a call if any of my monitored services fail. Theoretically, I can be at a park playing with my boys and know that my servers are happy... until the cellphone rings.
I understand that I have kind of a unique situation, but the same concept is applicable in a business production environment, so lets get down to looking at code.
First, let's talk about the Help Desk monitoring script. Essentially, this script checks to see if there are any files in the INBOX in the Help Desk mailbox. Here is the code:
#!/usr/bin/perl -w local *DIR; my ($file, $error); $error = 0; opendir DIR, "/var/spool/asterisk/voicemail/customers/611/INBOX/" or die("Error: Permission denied\n"); while ($file = readdir(DIR)) { if ($file eq ".") { next; } if ($file eq "..") { next; } $error++; } $error = $error/4; if (!$error) { print "OK\n"; exit 0; } else { print "CRITICAL: $error\n"; } exit 2;Of course, you need to make sure that the Nagios user has access to the Asterisk voicemailbox, but that can be taken care of by setting the script set-uid. The script, as you can see, is pretty simple. If there are any other files in the directory, the script assumes that there is a voicemail and sends a CRITICAL alert to Nagios. Otherwise everything is OK.
To enable Nagios to use this check script, we need to define it in checkcommands.cfg. Here is the definition, I used:
define command{ command_name check_help command_line /etc/nagios/local/check_611.pl }Now, I can refer to the check_help check script in the services.cfg file. Here's how I did it:
define service { use generic-service name Help_Desk host_name my_server service_description Help Desk Voicemail check_command check_help register 1 }With this configuration in place, Nagios can indicate an alarm any time there is voicemail in the Help Desk mailbox. But that's only half of what I promised to write about. The next script allows Nagios to call me to let me know that I've got a fire to put out. Here is that script:
#!/usr/bin/perl foreach $main::phone ("15055551234") { $main::call = <MaxRetries: 0 RetryTime: 1 WaitTime: 120 Account: Enterprise Context: apps Extension: OUTAGE Priority: 1 EOF ; open FILE, ">/tmp/outage.call"; print FILE $main::call; close FILE; system("mv /tmp/outage.call /var/spool/asterisk/outgoing"); } As you can see, this script isn't complicated, either. It simply creates an Asterisk "call file" and puts it in Asterisk's outgoing spool directory. The script is capable of calling multiple numbers... just in case. It's important to that the call file be created in another directory and moved into the spool directory. Otherwise bad things can happen if Asterisk tries to read the file while the script is still writing it.
Obviously this script relies on some configuration in the Asterisk dial plan. Here is the relevant part of the dial plan:
exten => OUTAGE,1,answer exten => OUTAGE,2,playback(/etc/asterisk/sounds/OUTAGE) exten => OUTAGE,3,hangupAt this point, you're probably realizing that I'm not doing anything complicated. All that is needed from Asterisk's point of view is an audio message in /etc/asterisk/sounds/OUTAGE (.wav or .au) that indicates that something is on fire. Asterisk will select the most reasonable file extension and play the file when the call is answered.
So all that is left to do is configure Nagios to use this notification method. This is configured in the misccommands.cfg file. Here is how I did it:
# 'notify_by_phone' command definition define command{ command_name notify_by_phone command_line /etc/nagios/local/notify_by_phone.pl }Now that all of the configuration is done, we restart Nagios and reload the Asterisk dial plan. To do this, we type "/etc/init.d/nagios restart" at the command line and "extensions reload" at the Asterisk console.
So now, anytime I have voicemail at the Help Desk, it's indicated in the Nagios monitoring screen as a critical alert. Also, anytime any of my servers or services are unavailable, I can get a phone call on either my home phone or my cell phone. This means that my customers don't HAVE to have those phone numbers and I can still provide quality service to them.
Now I realize that I have a unique situation, but I hope that this article serves as an example of how to create custom Nagios service checks and notifications, as well as hinting at some of the integration options available in Asterisk.
__________________________
Mike Diehl is a recently self-employed Computer Nerd and lives in Albuquerque, NM. with his wife and 3 sons. He can be reached at [email protected]
November 20, 2008 | Linux.com
System monitoring tool Nagios offers a powerful mechanism for receiving events and commands from external applications. External commands are usually sent from event handlers or from the Nagios Web interface. You will find external commands most useful when writing event handlers for your system, or when writing an external application that interacts with Nagios.
This article is excerpted from the newly published book Learning Nagios 3.0 from Packt Publishing.
The external commands pipe is a pipe file created on a filesystem that Nagios uses to receive incoming messages. The communication does not use any authentication or authorization -- the only requirement is to have write access to the pipe file, rw/nagios.cmd, which is located in the directory passed as the localstatedir option during compilation.
An external command file is usually writable by the owner and the group; the usual group used is nagioscmd. If you want a user to be able to send commands to the Nagios daemon, simply add that user to this group.
A small limitation of the command pipe is that there is no way to get any results back, so it is not possible to send any query commands to Nagios. Therefore, by just using the command pipe, you have no verification that the command you have passed to Nagios has been processed, or will be processed soon. It is, however, possible to read the Nagios log file and check whether it indicates that the command has been parsed correctly.
The Nagios Web interface uses an external command pipe to control how Nagios works. The Web interface does not use any other means to send commands or apply changes to Nagios.
From the Nagios daemon perspective, there is no clear distinction as to who can perform what operations. Therefore, if you plan to use the external command pipe to allow users to submit commands remotely, you need to make sure that authorization is in place so that unauthorized users cannot send potentially dangerous commands to Nagios.
The syntax for formatting commands is easy. Each command must be placed on a single line and end with a newline character. The syntax is as follows:
[TIMESTAMP] COMMAND_NAME;argument1;argument2;...;argumentN
TIMESTAMP is written as Unix time -- that is, the number of seconds since 1970-01-01 00:00:00. You can create this by using the date command. Most programming languages also offer the means to get the current Unix time.
Commands are written in upper case. The arguments depend on the actual command. For example, to add a comment to a host stating that it has passed a security audit, you can use the following shell command:
echo "['date +%s'] ADD_HOST_COMMENT;somehost;1;Security Audit; This host has passed security audit on 'date +%Y-%m-%d'" >/var/nagios/rw/nagios.cmd
This will send an ADD_HOST_COMMENT command to Nagios over the external command pipe. Nagios will then add a comment to the host, somehost, stating that the comment originated from Security Audit. The first argument specifies the host name to add the comment to; the second tells Nagios if this comment should be persistent. The next argument describes the author of the comment, and the last argument specifies the actual comment text.
Similarly, adding a comment to a service requires the use of the ADD_SVC_COMMENT command. The command's syntax is similar to that of the ADD_HOST_COMMENT command except that the command requires the specification of the host name and service name.
You can also delete a single comment or all comments using the DEL_HOST_ COMMENT, DEL_ALL_HOST_COMMENTS, and DEL_SVC_COMMENT or DEL_ALL_SVC_COMMENTS commands.
Other commands worth mentioning are related to scheduling checks on demand. Often, it is necessary to request that a check be carried out as soon as possible; for example, when testing a solution.
You can create a script that schedules a check of a host, all services on that host, and a service on a different host, as follows:
#!/bin/sh NOW='date +%s'
echo "[$NOW] SCHEDULE_HOST_CHECK;somehost;$NOW" \ >/var/nagios/rw/nagios.cmd
echo "[$NOW] SCHEDULE_HOST_SVC_CHECKS;somehost;$NOW" \ >/var/nagios/rw/nagios.cmd
echo "[$NOW] SCHEDULE_SVC_CHECK;otherhost;Service Name;$NOW" \ >/var/nagios/rw/nagios.cmd exit 0
The commands SCHEDULE_HOST_CHECK and SCHEDULE_HOST_SVC_CHECKS accept a host name and the time at which the check should be scheduled. The SCHEDULE_SVC_CHECK command requires the specification of a service description as well as the name of the host to schedule the check on.
Normal scheduled checks, such as the ones scheduled above, might not actually take place at the time that you scheduled them. Nagios also needs to take allowed time periods into account as well as checking whether checks were disabled for a particular object or globally for the entire Nagios.
There are cases when you'll need to force Nagios to do a check -- in such cases, you should use SCHEDULE_FORCED_HOST_CHECK, SCHEDULE_FORCED_HOST_SVC_CHECKS, and SCHEDULE_FORCED_SVC_CHECK commands. They work in exactly the same way as described above, but make Nagios skip the checking of time periods, and ensure that the checks are disabled for this particular object. This way, a check will always be performed, regardless of other Nagios parameters.
Other commands worth using are related to custom variables, introduced in Nagios 3. When you define a custom variable for a host, service, or contact, you can change its value on the file with the external command pipe.
As these variables can then be directly used by check or notification commands and event handlers, it is possible to make other applications or event handlers change these attributes directly without modifications to the configuration files.
How might this work? Suppose that the IT staff registers its presence via an application without any GUI. This application periodically sends information about the latest known IP address, and that information is then passed to Nagios assuming that the person is in the office. This would later be sent to a notification command to use that specific IP address while sending a message to the user.
Assuming that the user name is jdoe and the custom variable name is DESKTOPIP, the message that would be sent to the Nagios external command pipe would be as follows:
[1206096000] CHANGE_CUSTOM_CONTACT_VAR;jdoe;DESKTOPIP;12.34.56.78This would cause a subsequent use of $_CONTACTDESKTOPIP$ to return a value of 12.34.56.78.
Nagios offers the CHANGE_CUSTOM_CONTACT_VAR, CHANGE_CUSTOM_HOST_VAR, and CHANGE_CUSTOM_ SVC_VAR commands for modifying custom variables in contacts, hosts, and services.
The commands explained above are just a small subset of the full capabilities of the Nagios external command pipe. For a complete list of commands, visit the External Command List.
Nov 28, 2005
Tags: monitoring, nagios
Nagios is a powerful, modular network monitoring system that can be used to monitor many network services like smtp, http and dns on remote hosts. It also has support for snmp to allow you to check things like processor loads on routers and servers. I couldn't begin to cover all of the things that nagios can do in this article, so I'll just cover the basics to get you up and running.
apt-get install nagios-textFirst we need to define people that will be notified, and define how they should be notified. In the example below, I define two users, joe and paul. Joe is the network guru and cares about routers and switches. Paul is the systems guy, and he cares about servers. Both will be notified via email and by pager. Note that if you are going to monitor your email server, you will want to use another notification method besides email. If your email server is down, you can't send anybody an email to notify them! :) In that case you will want to use a pager server to send a text message to a phone or pager, or set up a second nagios monitor that uses a different mail server to send email.Edit /etc/nagios/contacts.cfg and add the following users:
define contact{ contact_name joe alias Joe Blow service_notification_period 24x7 host_notification_period 24x7 service_notification_options w,u,c,r host_notification_options d,u,r service_notification_commands notify-by-email,notify-by-pager host_notification_commands host-notify-by-email,host-notify-by-epager email [email protected] pager [email protected] } define contact{ contact_name paul alias Paul Shiznit service_notification_period 24x7 host_notification_period 24x7 service_notification_options w,u,c,r host_notification_options d,u,r service_notification_commands notify-by-email,notify-by-epager host_notification_commands host-notify-by-email,host-notify-by-epager email [email protected] pager [email protected] }Now add the users to groups.
In /etc/nagios/contactgroups.cfg add the following:define contactgroup{ contactgroup_name router_admin alias Network Administrators members joe } define contactgroup{ contactgroup_name server_admin alias Systems Administrators members paul }You can add multiple members to a contact group by listing comma separated users.Now to define some hosts to monitor. For my example, I define two machines, a mail server and a router.
Edit /etc/nagios/hosts.cfg and add:
define host{ use generic-host host_name gw1.yourdomain.com alias Gateway Router address 10.0.0.1 check_command check-host-alive max_check_attempts 20 notification_interval 240 notification_period 24x7 notification_options d,u,r } define host{ use generic-host host_name mail.yourdomain.com alias Mail Server address 10.0.0.100 check_command check-host-alive max_check_attempts 20 notification_interval 240 notification_period 24x7 notification_options d,u,r }Now we add the hosts to groups. I define groups called 'routers' and 'servers' and add the router and mail server respectively.Edit /etc/nagios/hostgroups.cfg
define hostgroup{ hostgroup_name routers alias Routers contact_groups router_admin members gw1.yourdomain.com } define hostgroup{ hostgroup_name servers alias Servers contact_groups server_admin members mail.yourdomain.com }Again, for multiple members, just use a comma separated list of hosts.Next define services to monitor on each of the hosts. Nagios has many built-in plugins for monitoring. On a debian sarge system, they are stored in /usr/lib/nagios/plugins. Here we want to monitor the smtp service on the mail server, and do ping checks on the router.
Edit /etc/nagios/services.cfg
define service{ use generic-service host_name mail.yourdomain.com service_description SMTP is_volatile 0 check_period 24x7 max_check_attempts 3 normal_check_interval 5 retry_check_interval 1 contact_groups server_admin notification_interval 240 notification_period 24x7 notification_options w,u,c,r check_command check_smtp } define service{ use generic-service host_name gw1.yourdomain.com service_description PING is_volatile 0 check_period 24x7 max_check_attempts 3 normal_check_interval 5 retry_check_interval 1 contact_groups router_admin notification_interval 240 notification_period 24x7 notification_options w,u,c,r check_command check_ping!100.0,20%!500.0,60% }And that's it. To test your configurations, you can runnagios -v /etc/nagios/nagios.cfgIf all is well we can restart nagios and move on to the apache side to get a visual view of the monitor./etc/init.d/nagios restartAssuming you have a working apache install, you can add the apache.conf file included in the nagios package to set up the nagios cgi administration interface. The web interface is not required to run nagios, but it is definitely worth setting it up. The simplest way to get it up and running is to copy the supplied conf file over to our apache installation. On my system, I'm running apache2. Systems running apache 1.3.xx will have slightly different setups.cp /etc/nagios/apache.conf /etc/apache2/sites-enabled/nagiosOf course you may want to set it up as a virtual server, but I leave that as an exercise for the reader. Now you will want to set up an allowed user to view the cgi interface. By default, nagios issues full administrative access to the nagiosadmin user. Nagios uses apache htpasswd style authentication. So here we add a user and password to the default nagios htpasswd file. Here we add the user nagiosadmin with password mypassword to the nagios htpasswd file.htpasswd2 -nb nagiosadmin mypassword >> /etc/nagios/htpasswd.usersYou should now be able to restart apache and logon tohttp://your.nagios.server/nagios
Nagios is a very powerful tool for monitoring networks. I've only touched on the basics here, but it should be enough to get you up and running. Hopefully, once you do, you'll start experimenting with all the cool features and plugins that are available. The documentation included in the cgi interface is very detailed and helpful.
"Howto be a good (and lazy) System Administrator." A couple astute readers, after reading the article, asked if I was familiar with the Nagios monitoring system, and I am. I've been using Nagios for a few years now.I had intended to write this article as a How-to on getting Nagios configured and running for the first time. However, it turns out that the documentation that comes with Nagios is really pretty good. And even if you do have problems, and I did, the user community is also quite responsive. So, rather than beating a dead horse, (with sympathy to horse lovers) I decided to continue the Good and Lazy Administrator Theme and discuss extending Nagios with custom service checks and custom notifications.
Nagios uses a plug-in mechanism to implement all of it's server and service checks as well as all of it's notifications. This is good news for hackers, as it allows us to build new functionality that either no one else has though of, or has need of. I wrote a couple scripts for my Nagios system. One does a custom service check to see if I have voicemail waiting for me at the Help Desk, and the other does a custom notification by telephone. Before I go on, I should give a little bit of background.
check_logfiles is a plugin for Nagios which checks logfiles for defined patterns. It is capable of detecting logfile rotation. If you tell it how the rotated archives look, it will also examine these files. Unlike check_logfiles, traditional logfile plugins were not aware of the gap which could occur, so under some circumstances they ignored what had happened between their checks. A configuration file is used to specify where to search, what to search, and what to do if a matching line is found.
About: Nagstamon is a Nagios status monitor with a UI that resides in the GNOME systray or on the Windows desktop. It informs you in realtime about the status of your Nagios monitored network.
Changes: This release fixes a problem with passwords containing special characters, and an issue where it omitted showing failed services on hosts in scheduled downtime.
About: check_oracle_health is a plugin for the Nagios monitoring software that allows you to monitor various metrics of an Oracle database. It includes connection time, SGA data buffer hit ratio, SGA library cache hit ratio, SGA dictionary cache hit ratio, SGA shared pool free, PGA in memory sort ratio, tablespace usage, tablespace fragmentation, tablespace I/O balance, invalid objects, and many more.Release focus: Major feature enhancements
Changes: The tablespace-usage mode now takes into account when tablespaces use autoextents. The data-buffer/library/dictionary-cache-hitratio are now more accurate. Sqlplus can now be used instead of DBD::Oracle.
About: check_lm_sensors is a Nagios plugin to monitor the values of on-board sensors and hard disk temperatures on Linux systems.
Changes: The plugin now uses the standard Nagios::Plugin CPAN classes, fixing issues with embedded perl.
check_logfiles 2.3.3 (Default)
Added: Sun, Mar 12th 2006 15:09 PDT (2 years, 1 month ago)
Updated: Tue, May 6th 2008 10:37 PDT (today)
About:check_logfiles is a plugin for Nagios which checks logfiles for defined patterns. It is capable of detecting logfile rotation. If you tell it how the rotated archives look, it will also examine these files. Unlike check_logfiles, traditional logfile plugins were not aware of the gap which could occur, so under some circumstances they ignored what had happened between their checks. A configuration file is used to specify where to search, what to search, and what to do if a matching line is found.
Spot on for a well structured book with many WOW-factors,
May 17, 2007 By Nils Valentin (Tokyo, Japan) - See all my reviews
--- DISCLAIMER: This is a requested review by PTR, however any opinions expressed within the review are my personal ones. ---Introduction - 6p
CHAPTER 1 Best Practices - 12p
CHAPTER 2 Theory of Operations - 26p
CHAPTER 3 Installing Nagios - 11p
CHAPTER 4 Configuring Nagios - 23p
CHAPTER 5 Bootstrapping the Configs - 10p
CHAPTER 6 Watching - 46p
CHAPTER 7 Visualization - 42p
CHAPTER 8 Nagios Event Broker Interface - 19p
APPENDIX A Configure Options - 3p
APPENDIX B nagios.cfg and cgi.cfg - 9p
APPENDIX C Command-Line Options - 10p
Index - 14pThe book is with 190 pages (230p. when including appendix and index) very compact. It teaches you Nagios in a way I have never heard / read before. I must assume that the authors clear structured style - which runs through the book like a red line - must be responsible for the excellent outcome.
The book starts in the introduction with the title "Do it right the first time" and that hits it right on the spot. What make out the features of this little portable knowledgebase is the exceptional well thought through contents and its explanations by the author. David is not filling pages by explaining each and every parameter, but rather showing you the big picture, and explaining how to approach new issues or how one technical solution is better over another.
This is the book you should pass to your manager so (s)he understands why and how an open solution like Nagios is the better choice and can be used for achieving surpassing solutions.
The book itself basically is divided in two sections:
Background, setup and configuration - Chapters 1-5
Advanced Topics - Chapters 6-8I did find any of the chapters to have a nice balance of the amount of information needed but some EXCEPTIONAL good parts of book where:
Chapter 1 Best practices
Chapter 2 - the part about scheduling
Chapters 6-8 as a wholeChapter 6 has a thorough explanations on monitoring the different OS's (especially the Windows part !!) or other applications.
Chapter 7 for its overall thoroughness of how to visualize your data to reach the next level of a better understanding of the systems / network you are monitoring.
Chapter 8 is describing a filesystem based status interface. The NEB module will write a file with its current status code for each service. I have to admit that some technical details went over my head, but I thought that was pretty cool !!
The featured points above is what I found to be exceptionally good and most likely the strongest sales points for this little portable knowledgebase. That doesnt mean that the other not mentioned parts of the book are weak, mind you.Funny enough the above mentioned points where EXACTLY the points which I haven't seen explained this thorough anywhere before.
So David's book was exactly spot on for me.
Summary:
To sum it all up in very simple words: This is a hell of a book !!
Its the most compact, well structured book on Nagios that I have seen to date. It contains many WOW-factors. While reading each chapter you can virtually "feel" how Davids explanations and tips and tricks already helped you to avoid time consuming pitfalls.
So this book is not about "to buy or not to buy", this is an investment you dont want to miss !!
I was especially impressed by the thoroughness the book is written by from the first page. Also the contents of the first chapter wasnt new to me, the way it was explained already provided many of those A-ha moments.
The main asset of the book is not the description of the tools itself, but rather the tought and considerations the author put into it and the sharing of those thoughts in a way that the reader can actually visualize how and why one solution is better over another, without actually having to go to the "luxury to experience the pitfalls" in a live disaster scenario.
PS: AFTER I finished reading the book I re-read the "Editorial Review" Amazon gave above and found it pretty well describing the actual book and what you should expect.
>> You can find more reviews on Nagios related books including a comparison by deploying my profile. <<
With the Nagios Looking Glass (NLG) tool, developer Andy Shellam has tried to resolve a common problem for network administrators running Nagios. What happens if you want to provide access to up-to-date information from Nagios without giving users access to the full Nagios console? Providing read-only access to the Nagios console can be complicated, and can occasionally require network re-structuring or can even pose a security risk.
NLG is designed to fix those issues by taking a feed from Nagios status data via an HTTP connection and displaying it on a public Web server. It works in a client-server model with a PHP-based polling server installed on your Nagios server. A receiver client, also PHP-based, is installed on your Web server. If you want to use NLG locally, you can also run the client and the server together on your Nagios server. The receiver client creates an AJAX-enabled page based on a template. You can also customize this template to display whatever you require.
You can see a demo of NLG at http://looking-glass.andyshellam.eu/demo/.
02.05.2007
Nagios also comes with a Web-based console, extensible Nagios Event Broker (NEB), that allows you to integrate Nagios with other tools, like database back-ends, and a large collection of monitoring commands and capabilities. It's current release, version 2.0, is stable and production ready. You can take a look at Nagios at http://www.nagios.com.
Development of Nagios has not stopped with version 2.0, though. Nagios' principal developer, Ethan Galstad, has recently released some information on the status and potential features of the next release, version 3.0. Galstad's announcement also suggests an alpha release of version 3.0 could be scheduled as early as the end of February 2007.
Features: What's new in Nagios 3.0
So what's new with version 3.0? Well, a lot. Let's walk through the major new features and look at how some of Nagios' old features have been expanded or changed.
One of the interesting features introduced in Nagios 2.0 was adaptive monitoring. Adaptive monitoring allowed a Nagios configuration to be changed during runtime. For example, you can change the command being used to check a host, based on changing conditions in your environment. In the new version, this functionality is expanded to include the ability to change the times during which checks are scheduled to occur. This allows you to turn on/off checks at specific times according to conditions in your environment.
Notifications have also been enhanced, now allowing a delay to be added to first notifications. Notifications can be generated when flapping is disabled and, most importantly, notifications can now be sent out when a scheduled downtime starts, ends or is cancelled.
Objects and templates haven't been forgotten either. One particularly useful change is the ability to use multiple templates for objects. Another is the addition of custom variables in host, service and contact objects. Version 2.0 only allows the application of one template to an object. Multiple templates offer greater flexibility and power, which will make a significant difference to the configuration of objects.
Custom variables allow you to define your own directives in object definitions and, therefore, attach additional information about an object to its definition. These variables can be retrieved and used elsewhere in your Nagios environment. For instance, you could define the SNMP community strings for a host in its definition and then use these later in a check or external command.
Other object and template changes include: merging service and host-extended information object data into service and host object definitions, and adding group member directives to the host and service group objects.
Enhancements to external commands are also present, including the ability to process commands found in an external file. The suggested use of this functionality is for passive checks with long output or complicated scripting. A further added to Nagios 3.0 is that external command checking is now turned on by default. In previous versions, such checking was set off by default.
Host and service logic alterations have also been made. Most notably, host checks now run asynchronously in parallel with each other. This should help balance overall check performance. Another enhancement is the ability to cache host and service check results and a function to enable the predictive checking of dependent hosts and services.
The ability to output multiple lines of data from host and service checks has also been added. Previously, Nagios 2.0 was limited to a single line of output from checks, thus reducing the utility of some checks. Now, multiple lines can be received and processed by Nagios and the size of plug-in output has been correspondingly increased to 2Kbs.
A number of performance optimizations have been included in Nagios 3.0, as well as enhancements to the Nagios Event Broker and the embedded Perl interpreter. Also worth mentioning are updates to macros and to status, comment and retention data.
01.29.2007 | searchenterpriselinux.techtarget.com
The Nagios enterprise monitoring tool generates a variety of events. The principal events generated are the results of monitoring applications, databases, devices, services and hosts. Also generated is performance data and notification events such as outages and downtime. There are a number of ways to integrate and utilize these events. The most advanced and effective event integration mechanism is the Nagios Event Broker (NEB).
NEB uses callback routines that are executed when events occur in the Nagios server. Using NEB you can write broker modules that can process these events. NEB allows you to output and integrate events into a variety of tools including MySQL databases, SNMP traps, syslog messages or use the event data in a variety of other applications and tools.
Nagios Event Broker functions and triggers
NEB uses shared code libraries called modules that are hooked into the Nagios server when it is executed. Each module can register callback procedures that are able to receive and process events. When an event occurs, NEB checks for the presence of a registered callback and, if detected, sends the event to the module. The module receives the event and performs whatever actions are coded into it.
The broker can process a large number of events including, amongst others:
- Nagios process startup and shutdown
- Host and Service checks
- Plug-in commands and notifications
- External commands and event handlers
- Flapping, comments and downtime
You can see a full list of the callbacks in the nebcallbacks.h include file located in the include directory of the Nagios source package.
Enabling Nagios Event Broker
NEB should be enabled by default when you compile Nagios (unless you disable it). If you want to ensure that NEB gets compiled then specify the --enable-neb configure option when configuring Nagios.
# ./configure --enable-nebRegistering modules with Nagios Event Broker
Modules are included into the Nagios configuration by using broker_module configuration options in the nagios.cfg configuration file. For example:
broker_module=/usr/local/nagios/bin/testmodule.oThis line would load a module called testmodule.o located in the /usr/local/nagios/bin directory. You can also specify a configuration file for a module like so:
broker_module=/usr/local/nagios/bin/testmodule.o config_file=/usr/local/nagios/etc/testmodule.cfgYou need to restart Nagios for any newly defined modules to take effect.
Writing modules for Nagios Event Broker
NEB Modules can be written in C or C++. You can see an example of a module in the Nagios package. Located in the module directory off the root of the package is the helloworld module. You can create it by compiling the helloworld.c file.
# gcc -shared -o helloworld.o helloworld.cYou can then add this module to Nagios using the broker_module directive in the nagios.cfg configuration file. Restart Nagios and the module is now loaded.
The Helloworld module is extremely simple. Helloworld logs a message to the default Nagios log file when Nagios is started and stopped and when aggregated status updates start and finish. The message looks like:
[1137151111] helloworld: An aggregated status update just started. [1137151112] helloworld: An aggregated status update just finished.You can review the contents of this module (which includes some basic inline documentation)
Available modules for Nagios Event Broker
There are not a lot of NEB modules available, so far. The most well-known NEB module is the NDO Utilities module. The NDO Utilities module is written by Nagios' developer, Ethan Galstad, and is designed to output events and data from Nagios to standard file or a Unix socket. It also comes with a module, NDO2DB, that can write Nagios data to a MySQL or PostgreSQL database. It should provide (together with the helloworld module) a good introduction to NEB and help you get started on writing your own modules.
You can also find the following NEB modules:
NEB module that logs to a socket based on client requests A NEB module (as yet unreleased) that does event correlation with Nagios and SEC. A NEB module that helps integrate Cacti with Nagios. Further help with Nagios Event Broker
There is not a lot of documentation available for NEB thus far. The only major piece of documentation available is about the NEB API. You can also review the Nagios source code relevant to NEB, particularly the include files.
As always the Nagios development and user mailing lists are good starting places for assistance.
Dec 05, 2005 | Sys Admin
In the past few years, Nagios has become the industry standard open source systems monitoring tool. If you're using an open source app to monitor the availability, state, or utilization of your servers or network gear, then chances are you are using Nagios to do it. To those who have worked with it, this is no surprise. The lightweight design of Nagios offloads the actual query logic into "plug-ins", which are easily created, modified, and re-purposed by sys admins. The lack of complex query logic leaves the Nagios daemon free to manage scheduling and notifications and to handle UI.
Nagios's "keep it simple" approach makes it straightforward to administer, network transparent, and amazingly flexible.
Two excellent articles by Syed Ali in previous editions of Sys Admin covered the installation and configuration of Nagios. In this article, I'll pick up where those articles left off and provide some creative solutions to problems commonly faced by sys admins working with Nagios to monitor the health and performance of systems.
There was nothing technically wrong with the HP ProLiant servers at Mynewplace.com, an online rental services agency based in San Francisco, but the IT staff kept on getting beeped at 4 a.m. with alerts that eventually proved to be false alarms.So while the servers were fine, the IT staff wasn't. Entire days were being wasted each month diagnosing their clutch of 50 HP ProLiant DL145s and DL385s running Red Hat Enterprise Linux 4 AS and ES, said John Shin, Mynewplace.com's director of systems. Shin decided he needed to make some changes. .
Struggling with network monitoring
"We were struggling with monitoring," Shin said, but that may have been an understatement. Things were so bad, in fact, that at one point last year he contemplated disabling the monitoring application altogether because it was doing more harm than good.
The application was Nagios, a popular open source systems and network monitoring application that provides alerts for user-defined hosts and services. In Shin's network, however, it was triggering false alarms because of simple network management protocol [SNMP] incompatibilities with Mynewplace.com's open source application server, Resin 3.0. Resin is based on a Java implementation of the PHP scripting language and is maintained and supported by San Diego-based Caucho Technology Inc.
Nagios, JVM and Resin 3.0 woes
Since Resin and Nagios were not directly compatible, Shin would expose the application stack's Java virtual machines (JVMs) through SNMP and monitor the environment that way. Unfortunately, response times under those conditions were sluggish, he said.
"Nagios was not really the problem," Shin said. "It was the JVM stack not being able to respond to it correctly. It was recording events in SNMP that were then watched by Nagios and that made things crawl. There were a lot of man hours wasted, and it would trigger the 4 a.m. pages."
In spite of its popularity on open source repositories like SourceForge.net, Nagios has its detractors. In a recent interview about Nagios with SearchEnterpriseLinux.com, Zenoss Inc. CEO Bill Karpovich criticized Nagios for its lack of enterprise-level support. "The maintainers never thought of it as a project that an IT manager would use to monitor an entire enterprise environment," he said. Zenoss is an open source startup vendor in the systems management space.
... ... ...
The feature-rich, expensive offerings from HP and the other members of the "big four" – IBM, CA and BMC – have spawned the "little four" (a phrase coined by analyst firm RedMonk), comprised of Hyperic, Zenoss, Qlusters and GroundWork. Executives from those companies have bet their chips on the valuable midmarket for customer wins like Mynewplace.com.
Compared with OpenView, offerings from the "little four" were priced approximately two-and-a-half times less on average, Shin found, although he would not cite specific dollar amounts. OpenView had another strike against it: "It did not have the framework in place to monitor some of our key applications," namely Resin and Postgres, Shin said.
Nagios is a free, open source enterprise monitoring tool designed to run on Linux. It has extensive monitoring and management capabilities that allow you to check applications, databases and network devices, as well as Windows and Unix/Linux hosts and services. It is easy to install, fast to configure and highly customizable.
Nagios also comes with a Web-based console, extensible Nagios Event Broker (NEB), that allows you to integrate Nagios with other tools, like database back-ends, and a large collection of monitoring commands and capabilities. It's current release, version 2.0, is stable and production ready. You can take a look at Nagios at http://www.nagios.com.
Development of Nagios has not stopped with version 2.0, though. Nagios' principal developer, Ethan Galstad, has recently released some information on the status and potential features of the next release, version 3.0. Galstad's announcement also suggests an alpha release of version 3.0 could be scheduled as early as the end of February 2007.
Features: What's new in Nagios 3.0
So what's new with version 3.0? Well, a lot. Let's walk through the major new features and look at how some of Nagios' old features have been expanded or changed.
One of the interesting features introduced in Nagios 2.0 was adaptive monitoring. Adaptive monitoring allowed a Nagios configuration to be changed during runtime. For example, you can change the command being used to check a host, based on changing conditions in your environment. In the new version, this functionality is expanded to include the ability to change the times during which checks are scheduled to occur. This allows you to turn on/off checks at specific times according to conditions in your environment.
Notifications have also been enhanced, now allowing a delay to be added to first notifications. Notifications can be generated when flapping is disabled and, most importantly, notifications can now be sent out when a scheduled downtime starts, ends or is cancelled.
Objects and templates haven't been forgotten either. One particularly useful change is the ability to use multiple templates for objects. Another is the addition of custom variables in host, service and contact objects. Version 2.0 only allows the application of one template to an object. Multiple templates offer greater flexibility and power, which will make a significant difference to the configuration of objects.
Custom variables allow you to define your own directives in object definitions and, therefore, attach additional information about an object to its definition. These variables can be retrieved and used elsewhere in your Nagios environment. For instance, you could define the SNMP community strings for a host in its definition and then use these later in a check or external command.
Other object and template changes include: merging service and host-extended information object data into service and host object definitions, and adding group member directives to the host and service group objects.
Enhancements to external commands are also present, including the ability to process commands found in an external file. The suggested use of this functionality is for passive checks with long output or complicated scripting. A further added to Nagios 3.0 is that external command checking is now turned on by default. In previous versions, such checking was set off by default.
Host and service logic alterations have also been made. Most notably, host checks now run asynchronously in parallel with each other. This should help balance overall check performance. Another enhancement is the ability to cache host and service check results and a function to enable the predictive checking of dependent hosts and services.
The ability to output multiple lines of data from host and service checks has also been added. Previously, Nagios 2.0 was limited to a single line of output from checks, thus reducing the utility of some checks. Now, multiple lines can be received and processed by Nagios and the size of plug-in output has been correspondingly increased to 2Kbs.
A number of performance optimizations have been included in Nagios 3.0, as well as enhancements to the Nagios Event Broker and the embedded Perl interpreter. Also worth mentioning are updates to macros and to status, comment and retention data.
To see a full list of the changes, or if you wish to try Nagios 3.0 before its alpha release, you can download a current CVS snapshot from http://www.nagios.org/development/cvs.php . The Changelog file contained in the snapshot provides a reasonably full list of the proposed changes.
Google matched content |
HowToContactNagios - Munin - Trac
Munin integrates perfectly with Nagios. There are, however, a few things of which to take notice. This article shows example configurations and explains the communication between the systems.Receiving messages in Nagios ¶
First you need a way for Nagios to accept messages from Munin. Nagios has exactly such a thing, namely the NSCA which is documented here: http://nagios.sourceforge.net/docs/1_0/addons.html#nsca.
NSCA consists of a client (a binary usually named send_nsca and a server usually run from inetd. We recommend that you enable encryption on NSCA communication.
You also need to configure Nagios to accept messages via NSCA. NSCA is, unfortunately, not very well documented in Nagios' official documentation. We'll cover writing the needed service check configuration further down in this document.
Configuring Nagios ¶
In the main config file, make sure that the command_file directive is set and that it works. See http://nagios.sourceforge.net/docs/2_0/configmain.html#command_file for details.
Below is a sample extract from nagios.cfg:
command_file=/var/run/nagios/nagios.cmdThe /var/run/nagios directory is owned by the user nagios runs as. The nagios.cmd is a named pipe on which Nagios accepts external input.
Configuring NSCA, server side ¶
NSCA is run through (x)inetd. Using inetd, the below line enables NSCA listening on port 5667:
5667 stream tcp nowait nagios /usr/sbin/tcpd /usr/sbin/nsca -c /etc/nsca.cfg --inetdUsing xinetd, the blow line enables NSCA listening on port 5667, allowing connections only from the local host:
# description: NSCA (Nagios Service Check Acceptor) service nsca { flags = REUSE type = UNLISTED port = 5667 socket_type = stream wait = no server = /usr/sbin/nsca server_args = -c /etc/nagios/nsca.cfg --inetd user = nagios group = nagios log_on_failure += USERID only_from = 127.0.0.1 }The file /etc/nsca.cfg defines how NSCA behaves. Check in particular the nsca_user and command_file directives, these should correspond to the file permissions and the location of the named pipe described in nagios.cfg.
nsca_user=nagios command_file=/var/run/nagios/nagios.cmdConfiguring NSCA, client side ¶
The NSCA client is a binary that submits to an NSCA server whatever it received as arguments. Its behavior is controlled by the file /etc/send_nsca.cfg, which mainly controls encryption.
You should now be able to test the communication between the NSCA client and the NSCA server, and consequently whether Nagios picks up the message. NSCA requires a defined format for messages. For service checks, it's like this: <host_name>[tab]<svc_description>[tab]<return_code>[tab]<plugin_output>[newline]
Below is shown how to test NSCA.
$ /usr/sbin/send_nsca -H localhost -c /etc/send_nsca.cfg foo.example.com test 0 0 1 data packet(s) sent to host successfully.This caused the following to appear in /var/log/nagios/nagios.log:
[1159868622] Warning: Message queue contained results for service 'test' on host 'foo.example.com'. The service could not be found!Messages are sent by munin-limits based on the state of a monitored data source: OK, Warning and Critical. Munin does not currently support a Unknown state (This will be fixed in the future, see Ticket 29 for more information).
Configuring munin.conf ¶
Nagios uses the above mentioned send_nsca binary to send messages to Nagios. In /etc/munin/munin.conf, enter this:
contacts nagios contact.nagios.command /usr/bin/send_nsca -H your.nagios-host.here -c /etc/send_nsca.cfg
! Be aware that the -H switch to send_nsca appeared sometime after send_nsca version 2.1. Always check send_nsca --help! Configuring Munin plugins ¶
Lots of Munin plugins have (hopefully reasonable) values for Warning and Critical levels. To set or override these, you can change the values in munin.conf.
Configuring Nagios services ¶
Now Nagios needs to recognize the messages from Munin as messages about services it monitors. To accomplish this, every message Munin sends to Nagios requires a matching (passive) service defined or Nagios will ignore the message (but it will log that something tried).
A passive service is defined through these directives in the proper Nagios configuration file:
active_checks_enabled 0 passive_checks_enabled 1A working solution is to create a template for passive services, like the one below:
define service { name passive-service active_checks_enabled 0 passive_checks_enabled 1 parallelize_check 1 notifications_enabled 1 event_handler_enabled 1 register 0 is_volatile 1 }When the template is registered, each Munin plugin should be registered as per below:
define service { use passive-service host_name foo service_description bar check_period 24x7 max_check_attempts 3 normal_check_interval 3 retry_check_interval 1 contact_groups linux-admins notification_interval 120 notification_period 24x7 notification_options w,u,c,r check_command check_dummy!0 }Notes ¶
- host_name is either the FQDN of the host_name registered to the Nagios plugin, or the host alias corresponding to Munin's
Best for Nagios admins who want specific details on plug-ins, September 4, 2006
- Paperback: 464 pages
- Publisher: No Starch Press; U.S. Ed edition (May 30, 2006)
- Language: English
- ISBN-10: 1593270704
- ISBN-13: 978-1593270704
- Product Dimensions: 9.2 x 7 x 1.1 inches
I recently received review copies of Pro Nagios 2.0 (PN2) by James Turnbull and Nagios: System and Network Monitoring (NSANM) by Wolfgang Barth. I read PN2 first, then NSANM. Both are excellent books, but I expect potential readers want to know which is best for them. The following is a radical simplification, and I could honestly recommend readers buy either (or both) books. If you are completely new to Nagios and want a very well-organized introduction, I recommend PN2. If you are somewhat familiar with Nagios and want detailed descriptions of a wide variety of Nagios plug-ins, I recommend NSANM.
By Richard Bejtlich "TaoSecurity.com" (Washington, DC) - See all my reviews
NSANM strengths lie in the depth of coverage of certain elements when compared to PN2.
- PN2 devotes 7 pages to host checks, while NSANM's Ch 7 offers 21 pages.
- PN2 supplies 8 pages on service checks, but NSANM's Ch 6 gives 46 pages.
This level of detail can be very useful. For example, NSANM's explanation of check_squid also shows to to configure Sguid to allow access to its cache manager.
NSANM shares more information on certain background protocols like SNMP. PN2's SNMP section is about 7 pages, whereas NSANM's Ch 11 is 36 pages. NSANM demonstrates more aspects of Nagios' Web interface and the CGI programs generating pages. I thought author Wolfgang Barth made very effective use of diagrams, like the network topology explanation in Ch 4, the service checks in Ch 5, and notification in Ch 12.
NSANM includes some material not mentioned in PN2, like using Nagios with Cygwin. Sometimes the books are very complementary, as shown by PN2's discussion of NSClient++ and NSANM's overview of NSClient and NC_Net.
NSANM is lacking coverage of security, redundancy, and failover, however. PN2 does address these critical issues. Beware the some of the "chapters" in NSANM are very short -- like Ch 8 (2 pages!) and Ch 19 (barely 6 pages). I think short sections like those should have been integrated into longer chapters or moved into the appendices.
Overall, NSANM is a very good book. I believe new Nagios readers should read PN2, and strongly consider NSANM as a complementary reference volume.
Nils Valentin (Tokyo, Japan) See all my reviews
Spot on for a well structured book with many WOW-factors, May 17, 2007
DISCLAIMER: This is a requested review by PTR, however any opinions expressed within the review are my personal ones. ---
Introduction - 6p
CHAPTER 1 Best Practices - 12p
CHAPTER 2 Theory of Operations - 26p
CHAPTER 3 Installing Nagios - 11p
CHAPTER 4 Configuring Nagios - 23p
CHAPTER 5 Bootstrapping the Configs - 10p
CHAPTER 6 Watching - 46p
CHAPTER 7 Visualization - 42p
CHAPTER 8 Nagios Event Broker Interface - 19p
APPENDIX A Configure Options - 3p
APPENDIX B nagios.cfg and cgi.cfg - 9p
APPENDIX C Command-Line Options - 10p
Index - 14pThe book is with 190 pages (230p. when including appendix and index) very compact. It teaches you Nagios in a way I have never heard / read before. I must assume that the authors clear structured style - which runs through the book like a red line - must be responsible for the excellent outcome.
The book starts in the introduction with the title "Do it right the first time" and that hits it right on the spot. What make out the features of this little portable knowledgebase is the exceptional well thought through contents and its explanations by the author. David is not filling pages by explaining each and every parameter, but rather showing you the big picture, and explaining how to approach new issues or how one technical solution is better over another.
This is the book you should pass to your manager so (s)he understands why and how an open solution like Nagios is the better choice and can be used for achieving surpassing solutions.
The book itself basically is divided in two sections:
Background, setup and configuration - Chapters 1-5
Advanced Topics - Chapters 6-8I did find any of the chapters to have a nice balance of the amount of information needed but some EXCEPTIONAL good parts of book where:
Chapter 1 Best practices
Chapter 2 - the part about scheduling
Chapters 6-8 as a wholeChapter 6 has a thorough explanations on monitoring the different OS's (especially the Windows part !!) or other applications.
Chapter 7 for its overall thoroughness of how to visualize your data to reach the next level of a better understanding of the systems / network you are monitoring.
Chapter 8 is describing a filesystem based status interface. The NEB module will write a file with its current status code for each service. I have to admit that some technical details went over my head, but I thought that was pretty cool !!
The featured points above is what I found to be exceptionally good and most likely the strongest sales points for this little portable knowledgebase. That doesnt mean that the other not mentioned parts of the book are weak, mind you.
Funny enough the above mentioned points where EXACTLY the points which I haven't seen explained this thorough anywhere before.
So David's book was exactly spot on for me.
Summary:
To sum it all up in very simple words: This is a hell of a book !!
Its the most compact, well structured book on Nagios that I have seen to date. It contains many WOW-factors. While reading each chapter you can virtually "feel" how Davids explanations and tips and tricks already helped you to avoid time consuming pitfalls.
So this book is not about "to buy or not to buy", this is an investment you dont want to miss !!
I was especially impressed by the thoroughness the book is written by from the first page. Also the contents of the first chapter wasnt new to me, the way it was explained already provided many of those A-ha moments.
The main asset of the book is not the description of the tools itself, but rather the tought and considerations the author put into it and the sharing of those thoughts in a way that the reader can actually visualize how and why one solution is better over another, without actually having to go to the "luxury to experience the pitfalls" in a live disaster scenario.
PS: AFTER I finished reading the book I re-read the "Editorial Review" Amazon gave above and found it pretty well describing the actual book and what you should expect.
>> You can find more reviews on Nagios related books including a comparison by deploying my profile. <<
Nagios Enterprises LLC and its president Ethan Galstead are going through some changes. Galstead said he started the Nagios project outside of work in 1999 and in 2005, he realized he could support himself by working full time on the project. Now, he's moved the project out of a home office and into rented office space.
As I reach him, he's in his new office, arranging the new furniture and recovering from a winter cold.
In September of 2009, Nagios released Nagios XI, the commercial version of its Nagios Core open source network management software. "There will always be the free version," Galstead says.
The commercial version incorporates the most popular community add ons to Nagios, including one of the graphical interfaces, but Galstead says that users of the open source version have access to the same add-ons.
In addition to the graphical interface, a popular add on is a graphing engine, which gives Nagios some of the graphing capabilities of Cacti. However, Galstead also admitted that many people use both Nagios and Cacti and that the two projects have a small amount of overlap.
Galstead says that people use open source products because they can add their own features. Many ISPs that use Nagios have written a script to check the border router and if the link is down, Nagios can power cycle the router automatically, but "anyone who's running an ISP has done that kind of thing before, whether they use Nagios or something else."
Galstead thinks that WISPA members may be interested to learn about MRTS out of Denmark, a custom version of the popular MRTG graphing engine that uses RRDTool to provide graphs over time. ISPs, of course, want to track monthly usage, and this tool can help.
Galstead says that many ISPs use Nagios in conjunction with a log manager like Syslog-ng. Commercial customers have asked for a feature that would allow Nagios to manage recurring downtime for some network elements.
WISPA members provided me with several questions, and one of those was how well Nagios LLC is doing with paid support. Galstead said that the reaction has not been as positive as he'd hoped, and he suspects that many large businesses feel they don't need premium support because they already have some of the best tech people in the business. So while premium support is not raking in the cash, Galstead finds that the commercial version of Nagios is being received well.
Galstead says that he appreciates the value of tech savvy users of open source software. "People like ISPs take Nagios to the edge of what it can do. Without people like that, open source projects don't improve or evolve."
Back when he started Nagios, Galstead says he wanted to return a favor.
"I used Linux, Apache, MySQL, Perl, and the free C compiler. I wanted to give back to the open source community."
Society
Groupthink : Two Party System as Polyarchy : Corruption of Regulators : Bureaucracies : Understanding Micromanagers and Control Freaks : Toxic Managers : Harvard Mafia : Diplomatic Communication : Surviving a Bad Performance Review : Insufficient Retirement Funds as Immanent Problem of Neoliberal Regime : PseudoScience : Who Rules America : Neoliberalism : The Iron Law of Oligarchy : Libertarian Philosophy
Quotes
War and Peace : Skeptical Finance : John Kenneth Galbraith :Talleyrand : Oscar Wilde : Otto Von Bismarck : Keynes : George Carlin : Skeptics : Propaganda : SE quotes : Language Design and Programming Quotes : Random IT-related quotes : Somerset Maugham : Marcus Aurelius : Kurt Vonnegut : Eric Hoffer : Winston Churchill : Napoleon Bonaparte : Ambrose Bierce : Bernard Shaw : Mark Twain Quotes
Bulletin:
Vol 25, No.12 (December, 2013) Rational Fools vs. Efficient Crooks The efficient markets hypothesis : Political Skeptic Bulletin, 2013 : Unemployment Bulletin, 2010 : Vol 23, No.10 (October, 2011) An observation about corporate security departments : Slightly Skeptical Euromaydan Chronicles, June 2014 : Greenspan legacy bulletin, 2008 : Vol 25, No.10 (October, 2013) Cryptolocker Trojan (Win32/Crilock.A) : Vol 25, No.08 (August, 2013) Cloud providers as intelligence collection hubs : Financial Humor Bulletin, 2010 : Inequality Bulletin, 2009 : Financial Humor Bulletin, 2008 : Copyleft Problems Bulletin, 2004 : Financial Humor Bulletin, 2011 : Energy Bulletin, 2010 : Malware Protection Bulletin, 2010 : Vol 26, No.1 (January, 2013) Object-Oriented Cult : Political Skeptic Bulletin, 2011 : Vol 23, No.11 (November, 2011) Softpanorama classification of sysadmin horror stories : Vol 25, No.05 (May, 2013) Corporate bullshit as a communication method : Vol 25, No.06 (June, 2013) A Note on the Relationship of Brooks Law and Conway Law
History:
Fifty glorious years (1950-2000): the triumph of the US computer engineering : Donald Knuth : TAoCP and its Influence of Computer Science : Richard Stallman : Linus Torvalds : Larry Wall : John K. Ousterhout : CTSS : Multix OS Unix History : Unix shell history : VI editor : History of pipes concept : Solaris : MS DOS : Programming Languages History : PL/1 : Simula 67 : C : History of GCC development : Scripting Languages : Perl history : OS History : Mail : DNS : SSH : CPU Instruction Sets : SPARC systems 1987-2006 : Norton Commander : Norton Utilities : Norton Ghost : Frontpage history : Malware Defense History : GNU Screen : OSS early history
Classic books:
The Peter Principle : Parkinson Law : 1984 : The Mythical Man-Month : How to Solve It by George Polya : The Art of Computer Programming : The Elements of Programming Style : The Unix Hater’s Handbook : The Jargon file : The True Believer : Programming Pearls : The Good Soldier Svejk : The Power Elite
Most popular humor pages:
Manifest of the Softpanorama IT Slacker Society : Ten Commandments of the IT Slackers Society : Computer Humor Collection : BSD Logo Story : The Cuckoo's Egg : IT Slang : C++ Humor : ARE YOU A BBS ADDICT? : The Perl Purity Test : Object oriented programmers of all nations : Financial Humor : Financial Humor Bulletin, 2008 : Financial Humor Bulletin, 2010 : The Most Comprehensive Collection of Editor-related Humor : Programming Language Humor : Goldman Sachs related humor : Greenspan humor : C Humor : Scripting Humor : Real Programmers Humor : Web Humor : GPL-related Humor : OFM Humor : Politically Incorrect Humor : IDS Humor : "Linux Sucks" Humor : Russian Musical Humor : Best Russian Programmer Humor : Microsoft plans to buy Catholic Church : Richard Stallman Related Humor : Admin Humor : Perl-related Humor : Linus Torvalds Related humor : PseudoScience Related Humor : Networking Humor : Shell Humor : Financial Humor Bulletin, 2011 : Financial Humor Bulletin, 2012 : Financial Humor Bulletin, 2013 : Java Humor : Software Engineering Humor : Sun Solaris Related Humor : Education Humor : IBM Humor : Assembler-related Humor : VIM Humor : Computer Viruses Humor : Bright tomorrow is rescheduled to a day after tomorrow : Classic Computer Humor
The Last but not Least Technology is dominated by two types of people: those who understand what they do not manage and those who manage what they do not understand ~Archibald Putt. Ph.D
Copyright © 1996-2021 by Softpanorama Society. www.softpanorama.org was initially created as a service to the (now defunct) UN Sustainable Development Networking Programme (SDNP) without any remuneration. This document is an industrial compilation designed and created exclusively for educational use and is distributed under the Softpanorama Content License. Original materials copyright belong to respective owners. Quotes are made for educational purposes only in compliance with the fair use doctrine.
FAIR USE NOTICE This site contains copyrighted material the use of which has not always been specifically authorized by the copyright owner. We are making such material available to advance understanding of computer science, IT technology, economic, scientific, and social issues. We believe this constitutes a 'fair use' of any such copyrighted material as provided by section 107 of the US Copyright Law according to which such material can be distributed without profit exclusively for research and educational purposes.
This is a Spartan WHYFF (We Help You For Free) site written by people for whom English is not a native language. Grammar and spelling errors should be expected. The site contain some broken links as it develops like a living tree...
|
You can use PayPal to to buy a cup of coffee for authors of this site |
Disclaimer:
The statements, views and opinions presented on this web page are those of the author (or referenced source) and are not endorsed by, nor do they necessarily reflect, the opinions of the Softpanorama society. We do not warrant the correctness of the information provided or its fitness for any purpose. The site uses AdSense so you need to be aware of Google privacy policy. You you do not want to be tracked by Google please disable Javascript for this site. This site is perfectly usable without Javascript.
Last modified: September 14, 2019