How to Monitor a Device with Nagios

by Mike on May 2, 2010

in Nagios

Once you have decided you want to monitor a device with Nagios, like a server, router, switch, printer, etc, then you will need to take three steps for each device.  This article assumes you have a working Nagios server.
The process illustrated here checking a router is the same process you will use to check servers, switchs, printers, etc.  It is always going to be a three step process.

Make sure to check out our live Nagios training class if you prefer a hands on, instructor-led approach to learning Nagios.

Monitoring A Device With Nagios

Step #1: Create a Host Entry

Each host must have an IP Address on your network.  In fact, that should be a static IP Address not a dynamic one as Nagios will tie the IP Address to the hostname.  So when you create a host entry you have two elements that are required for the host itself; IP Address and hostname.

IP Addresss:
Hostname: zyxel

It is important to recognize that the hostname is simply a name the administrator will give to the machine for visual recognition.  So you could easily alter the hostname  to create visual recognition based on location for example.

Location Based on Office Building
rtbl3a = router(rt) building 3(bl3) first router (a)

You get the idea, creating names that visually enable  you to recognize location without referring to a chart.

define host{
use                                      generic-switch
host_name                         zyxel
alias                                   zyzel router
notifications_enabled    0

You will also see another element of defining a host is that you are referring to a “generic-switch.”  This is a template that allows you to minimize the amount of  text required for each host.

This is the example of the “generic-switch” template found in the templates.cfg.  Note it creates many of the settings that you want but will not have to repeat.  It also calls another template “generic-host” so templates may use other templates as well.

define host{
name                                 generic-switch    ;name of this host template
use                                    generic-host      ;Inherit default values from the generic-host
check_period                  24×7                 ;switches are monitored round the clock
check_interval                5                   ;Switches are checked every 5 minutes
retry_interval                  1               ;Schedule host check retries at 1 minute
max_check_attempts   10                  ;Check each switch 10 times (max)
check_command         check-host-alive    ;Default command to check
notification_period     24×7            ;Send notifications at any time
notification_interval   30            ;Resend notifications every 30 minutes
notification_options    d,r           ;send notifications for specific host states
contact_groups          admins        ;Notifications sent to the admins by default
register                       0             ; DONT REGISTER THIS – ITS JUST A TEMPLATE

Step #2: Create a Service Entry

You may only want to verify a host is up and running but you still have to create a service for that to work correctly.  As a further note, any service that you use must have a corresponding command that describes how that service command will function.  The service_description is very important because Nagios uses this text string in the web interface and has implications when you use graphing options like NagiosGrapher.  This will mean that eventually you may want to have service_descriptions that are unique.  The check_command must be tied back to a command definition in the commands.cfg.

define service{
use                                   generic-service
host_name                      zyxel
service_description     PING
check_command           check_ping!200.0,20%!600.0,60%
normal_check_interval   5
retry_check_interval      1

The “generic-service” template contains a lot of information.  Most of the definitions are straightforward but here are a few items that need explanation.  “Freshness” is when Nagios decides that the recently determined status is out of date so it will perform an active test.  In order for this to work it must be enabled at the service level, as you see here, and at the global level in nagios.cfg.

check_service _freshness=1

Services that show an error status only once are called “volatile”.  If you turn on the is_volatile Nagios will treat each error as if it just occurred.   This has implications when you have logs that you are checking for example.  If they are “volatile” that will mean if an error is repeated in the logs it will be logged, trigger notification and event handling each time.

define service{
name                                            generic-service  ;The ‘name’ of this service template
active_checks_enabled           1         ;Active service checks are enabled
passive_checks_enabled        1         ;Passive service checks
parallelize_check                      1         ;Active service checks should be parallelized (disabling this can lead to major performance problems)
obsess_over_service             1         ;We should obsess over this service
check_freshness                      0         ;Default is to NOT check ‘freshness’
notifications_enabled             1         ;Service notifications are enabled
event_handler_enabled           1         ;Service event handler is enabled
flap_detection_enabled          1         ;Flap detection is enabled
failure_prediction_enabled     1         ;Failure prediction is enabled
process_perf_data                  1         ;Process performance data
retain_status_information       1         ;Retain status information
retain_nonstatus_information    1         ;Retain non-status information
is_volatile                                 0         ;not volatile
check_period                          24×7      ;service can be checked at any time
max_check_attempts              3         ;Re-check the service up to 3 times (hard)
normal_check_interval           10        ;Check the service every 10 minutes
retry_check_interval              2         ;Re-check every two minutes until a hard)
contact_groups                     admins    ;Notifications get sent out
notification_options            w,u,c,r   ;Send  warning, unknown, critical,recovery
notification_interval             60        ;Re-notify about problems every hour
notification_period             24×7      ;Notifications can be sent out at any time
register                               0        ;DONT REGISTER THIS DEFINITION

Step #3: Create a Commands Definition

The check_command must be defined in the commands.cfg.  Here is the example of the check_ping which has a command_name and then the command_line which provides for the structure of the command.  Here is the structure of the command:

$USER1$                                        path to the plugins directory
/check_ping                                 actual plugin name
-H $HOSTADDRESS$                allows for host IP Address
-w $ARG1$                                   warning and then argument for definition
-c $ARG2$                                   critical and then argument for definition
-p 5                                                 ping 5 times

define command{
command_name    check_ping
command_line    $USER1$/check_ping -H $HOSTADDRESS$ -w $ARG1$ -c $ARG2$ -p 5

Previous post:

Next post: