Once you have decided you want to monitor a device with Nagios, like a server, router, switch, printer, etc, then you will need to take three steps for each device. This article assumes you have a working Nagios server.
The process illustrated here checking a router is the same process you will use to check servers, switchs, printers, etc. It is always going to be a three step process.
Make sure to check out our live Nagios training class if you prefer a hands on, instructor-led approach to learning Nagios.
Step #1: Create a Host Entry
Each host must have an IP Address on your network. In fact, that should be a static IP Address not a dynamic one as Nagios will tie the IP Address to the hostname. So when you create a host entry you have two elements that are required for the host itself; IP Address and hostname.
IP Addresss: 192.168.5.79
It is important to recognize that the hostname is simply a name the administrator will give to the machine for visual recognition. So you could easily alter the hostname to create visual recognition based on location for example.
Location Based on Office Building
rtbl3a = router(rt) building 3(bl3) first router (a)
You get the idea, creating names that visually enable you to recognize location without referring to a chart.
alias zyzel router
You will also see another element of defining a host is that you are referring to a “generic-switch.” This is a template that allows you to minimize the amount of text required for each host.
This is the example of the “generic-switch” template found in the templates.cfg. Note it creates many of the settings that you want but will not have to repeat. It also calls another template “generic-host” so templates may use other templates as well.
name generic-switch ;name of this host template
use generic-host ;Inherit default values from the generic-host
check_period 24×7 ;switches are monitored round the clock
check_interval 5 ;Switches are checked every 5 minutes
retry_interval 1 ;Schedule host check retries at 1 minute
max_check_attempts 10 ;Check each switch 10 times (max)
check_command check-host-alive ;Default command to check
notification_period 24×7 ;Send notifications at any time
notification_interval 30 ;Resend notifications every 30 minutes
notification_options d,r ;send notifications for specific host states
contact_groups admins ;Notifications sent to the admins by default
register 0 ; DONT REGISTER THIS – ITS JUST A TEMPLATE
Step #2: Create a Service Entry
You may only want to verify a host is up and running but you still have to create a service for that to work correctly. As a further note, any service that you use must have a corresponding command that describes how that service command will function. The service_description is very important because Nagios uses this text string in the web interface and has implications when you use graphing options like NagiosGrapher. This will mean that eventually you may want to have service_descriptions that are unique. The check_command must be tied back to a command definition in the commands.cfg.
The “generic-service” template contains a lot of information. Most of the definitions are straightforward but here are a few items that need explanation. “Freshness” is when Nagios decides that the recently determined status is out of date so it will perform an active test. In order for this to work it must be enabled at the service level, as you see here, and at the global level in nagios.cfg.
Services that show an error status only once are called “volatile”. If you turn on the is_volatile Nagios will treat each error as if it just occurred. This has implications when you have logs that you are checking for example. If they are “volatile” that will mean if an error is repeated in the logs it will be logged, trigger notification and event handling each time.
name generic-service ;The ‘name’ of this service template
active_checks_enabled 1 ;Active service checks are enabled
passive_checks_enabled 1 ;Passive service checks
parallelize_check 1 ;Active service checks should be parallelized (disabling this can lead to major performance problems)
obsess_over_service 1 ;We should obsess over this service
check_freshness 0 ;Default is to NOT check ‘freshness’
notifications_enabled 1 ;Service notifications are enabled
event_handler_enabled 1 ;Service event handler is enabled
flap_detection_enabled 1 ;Flap detection is enabled
failure_prediction_enabled 1 ;Failure prediction is enabled
process_perf_data 1 ;Process performance data
retain_status_information 1 ;Retain status information
retain_nonstatus_information 1 ;Retain non-status information
is_volatile 0 ;not volatile
check_period 24×7 ;service can be checked at any time
max_check_attempts 3 ;Re-check the service up to 3 times (hard)
normal_check_interval 10 ;Check the service every 10 minutes
retry_check_interval 2 ;Re-check every two minutes until a hard)
contact_groups admins ;Notifications get sent out
notification_options w,u,c,r ;Send warning, unknown, critical,recovery
notification_interval 60 ;Re-notify about problems every hour
notification_period 24×7 ;Notifications can be sent out at any time
register 0 ;DONT REGISTER THIS DEFINITION
Step #3: Create a Commands Definition
The check_command must be defined in the commands.cfg. Here is the example of the check_ping which has a command_name and then the command_line which provides for the structure of the command. Here is the structure of the command:
$USER1$ path to the plugins directory
/check_ping actual plugin name
-H $HOSTADDRESS$ allows for host IP Address
-w $ARG1$ warning and then argument for definition
-c $ARG2$ critical and then argument for definition
-p 5 ping 5 times
command_line $USER1$/check_ping -H $HOSTADDRESS$ -w $ARG1$ -c $ARG2$ -p 5