Monitoring OSX and Linux with Nagios and SNMP

Server - Nagios

OSX

OSX is somewhat unique both in the fact that this is not a common server which is monitored and also your actually working with two operating systems, FreeBSD under the hood and OSX built on top. However, these examples could also be used for Linux distros with a few modifications. SNMP is selected for creating the checks because it is the easiest to deploy with minimal time on the servers to be monitored.

Configure SNMP on Client

SNMP must be configured on the client to allow Nagios to monitor. Even though SNMP is considered an agentless method of monitoring it does require that configuration, and possible installation of SNMP is required. In this example, as in most cases, SNMP has already been installed but is not configured nor is it actually running. The first step is to configure SNMP to allow access to the OIDs on the box.

Verify that SNMP is installed:

ls /usr/sbin/snmpd

Note the snmpd daemon will use port 161 UDP by default when it is started.

Configuration of snmpd can be achieved with the interactive command. Do not be intimidated by the number of entries that are required, simply work your way through this as it is not as difficult as it seems.

sudo snmpconf -g basic_setup

Or create a file manually, here are the basics of what you will need:

##### SNMP Configuration #####

##### Community String #####

com2sec notConfigUser 192.168.5.0/24 public

##### Security Name #####

group notConfigGroup v1 notConfigUser

group notConfigGroup v2c notConfigUser

##### View of Tree #####

view all included .1 80

#####

access notConfigGroup "" any noauth exact all none none

Save this file and restart the snmpd daemon.

Starting SNMP

Each time changes are made the snmpd must be restarted for those changes to be in effect. Here is an example using load and unload to start and stop the snmpd daemon.

sudo launchctl load -w /System/Library/LaunchDaemons/org.net-snmp.snmpd.plis

sudo launchctl unload -w /System/Library/LaunchDaemons/org.net-snmp.snmpd.plist

On CentOS

service snmpd restart

SNMP Commands

These plugins can all be found at:

http://exchange.nagios.org/directory/Plugins

check_snmp_storage.pl

Checking storage is somewhat different on OSX than on a Linux box. The mounted partitions are still available with the df command but note the mount points are different so take those into account.

df -h

Filesystem Size Used Avail Capacity Mounted on

/dev/disk0s2 465Gi 8.6Gi 456Gi 2% /

map auto_home 0Bi 0Bi 0Bi 100% /home

/dev/disk1s2 465Gi 427Mi 465Gi 1% /Volumes/Secondary HD

Examples

Check the commands that you want to use with SNMP from the command line first to verify that they are working like expected. To perform the checks go to the plugin directory and execute the checks as the nagios user.

su – nagios

cd /usr/local/nagios/libexec

Check storage with "-m" as the partition indicator.

./check_snmp_storage.pl -H 66.219.169.70 -C osxro -m /home -w 80 -c 90

/home: 0%used(0MB/0MB) (<80%) : OK

The “-r” option can be used to eliminate other partitions. In this example the / partition is the only one in the return.

./check_snmp_storage.pl -H 66.219.169.70 -C osxro -m / -r -w 80 -c 90

/: 2%used(8803MB/476120MB) (<80%) : OK

check_snmp_process.pl

SNMP can be used to monitor individual processes and the resources they can consume. One of the aspects that needs to be tested is what the typical needs of a process are so that both a WARNING state and a CRITICAL state can be created.

Examples

Test all of the checks before they are created as OSX will have some issues with normal checks that work with Linux. To perform the checks go to the plugin directory and execute the checks as the nagios user.

su – nagios

cd /usr/local/nagios/libexec

snmpd

This example of snmpd tests to see if at least one process is running.

./check_snmp_process.pl -H 66.219.169.70 -C osxro -n snmpd

1 process matching snmpd (> 0)

This check verifies that snmpd is using less than 5 MB of memory (typical usage) to be in an OK state and less than 8 MB of memory to stay out of a CRITICAL state.

./check_snmp_process.pl -H 66.219.169.70 -C osxro -n snmpd -m 5,8

1 process matching snmpd (> 0), Mem : 4.1Mb OK

syslogd

The example of monitoring the syslogd is that it typically uses less memory than the snmpd so the WARNING and CRITICAL states should be adjusted accordingly.

./check_snmp_process.pl -H 66.219.169.70 -C osxro -n syslogd -m 1,3

1 process matching syslogd (> 0), Mem : 0.8Mb OK

check_snmp_load.pl

Certainly the load on a server needs to be checked as this is one of the bottlenecks that can cause problems. This can be effectively achieved with SNMP.

Examples

The “-T” option is an important option to provide so that the check understands the nuances of the operating system. In this example, “-T netsl” is used which employs the typical Linux method of testing load using 1 minute, 5 minute and 15 minute averages. That is reflected in the output.

./check_snmp_load.pl -H 66.219.169.70 -C osxro -w 8,7,6 -c 10,9,9 -T netsl

Load : 0.00 0.00 0.00 : OK

check_snmp_int.pl

Monitoring a network interface for bandwidth and status can be done with this plugin. The plugin also allows for the checking of errors on the interface.

Testing

This example includes the in and out octets for bandwidth and the speed of the network port.

./check_snmp_int.pl -H 66.219.169.70 -C osxro -n en0 -f -S

en0:UP:1 UP: OK | 'en0_in_octet'=34437306c 'en0_out_octet'=29511027c 'en0_speed_bps'=100000000

Measurement of bandwidth can be achieved with the “-k” option. Here the WARNING state is more than 400 Kbps and less than 600 Kbps and the CRITICAL state is no traffic or more than 700 Kbps.

./check_snmp_int.pl -H 66.219.169.70 -C osxro -n en0 -k -w 400,600 -c 0,700

en0:UP (0.3KBps/0.5KBps):1 UP: OK

Bandwidth on “en0” is monitored in this example, but what is different is the measurement in Mbps instead of Kbps which is used if no measurement is set. Gigabits can also be used with the “-G” option.

./check_snmp_int.pl -H 66.219.169.70 -C osxro -n en0 -k -w 400,600 -c 0,800 -M

en0:UP (0.0MBps/0.0MBps):1 UP: OK

Errors can be monitored on the port using the “-f” and “-e” options.

./check_snmp_int.pl -H 66.219.169.70 -C osxro -n en0 -f -e

en0:UP:1 UP: OK | 'en0_in_octet'=34570753c 'en0_out_octet'=29727445c 'en0_in_error'=0c 'en0_in_discard'=0c 'en0_out_error'=0c 'en0_out_discard'=0c

check_snmp*

This is the default plugin for SNMP. One of the skills that will need to be acquired is the ability to research using snmpget and snmpwalk as often administrators must create customized checks.

Examples

Uptime

snmpget -v2c -c osxro 66.219.169.70 sysUpTime.0

DISMAN-EVENT-MIB::sysUpTimeInstance = Timeticks: (42901296) 4 days, 23:10:12.96

Now it is important to use the numerical OID and the MIB to save resources. These can be located by using the -On option.

snmpget -v2c -c osxro 66.219.169.70 sysUpTime.0 -On

.1.3.6.1.2.1.1.3.0 = Timeticks: (42915130) 4 days, 23:12:31.30

So instead of using the text option, like this:

./check_snmp -H 66.219.169.70 -C osxro -o sysUpTime.0

SNMP OK - Timeticks: (42776729) 4 days, 22:49:27.29 |

Use the numerical option.

snmpget -v2c -c osxro 66.219.169.70 .1.3.6.1.2.1.1.3.0

DISMAN-EVENT-MIB::sysUpTimeInstance = Timeticks: (42923677) 4 days, 23:13:56.77

sysLocation

snmpget 66.219.169.70 -v2c -c osxro 1.3.6.1.2.1.1.6.0

SNMPv2-MIB::sysLocation.0 = STRING: "Data Center"

check_by_snmp

The check_by_snmp plugin is a flexible plugin which will help solve difficult issues you may face with checks on a host. This plugin provides a way to execute commands on the remote server and execute scripts.

Examples

Edit snmpd.conf, add the entry, save and restart.

extend dfstuff /bin/df -h

./check_by_snmp.pl -H 66.219.169.70 -C osxro -E dfstuff

Filesystem Size Used Avail Capacity Mounted on

/dev/disk0s2 465Gi 8.6Gi 456Gi 2% /

devfs 183Ki 183Ki 0Bi 100% /dev

map -hosts 0Bi 0Bi 0Bi 100% /net

map auto_home 0Bi 0Bi 0Bi 100% /home

/dev/disk1s2 465Gi 427Mi 465Gi 1% /Volumes/Secondary HD

Using a Script

If you have scripts that would be useful, you can reference the scripts from the snmpd.conf file using extend.

Edit snmpd.conf, add the entry, save and restart.

extend mem /bin/sh /usr/share/snmp/mem.sh

Create a simple script that will be accessed from snmpd.conf.

mem.sh

#!/bin/sh

top -l 1 | grep PhysMem: | awk '{print $10}'

There are multiple aspects of memory you may want to monitor.

PhysMem (installed RAM)

wired I nformation in RAM that cannot be moved to hard drive

active information in RAM that has recently been used

inactive information in RAM but has not be recently used

used total RAM used

free unused memory

VM total amount of Virtual Memory for all processes

Page ins/Page outs amount of information moved between RAM and hard drive

* Page out is when data must be written to disk because of limited RAM

Swap used amount of information copied to swap file

In order to use these different aspects of memory a script will need to be created that will enable access to the data. The key tool here is the command top which means you will need to cut out the data required using awk.

Capture active memory

Create a simple script that will be accessed from snmpd.conf. Note the 4th field is captured.

active.sh

#!/bin/sh

top -l 1 | grep PhysMem: | awk '{print $4}'

Create a simple script that will be accessed from snmpd.conf. Note the 2nd field is captured from the line “VM”.

total_vm.sh

#!/bin/sh

top -l 1 | grep VM: | awk '{print $2}'

Create a simple script that will be accessed from snmpd.conf. Note the 7th field is captured from the line “VM”.

pageins.sh

#!/bin/sh

top -l 1 | grep VM: | awk '{print $7}'

Create a simple script that will be accessed from snmpd.conf. Note the 9th field is captured from the line “VM”.

pageouts.sh

#!/bin/sh

top -l 1 | grep VM: | awk '{print $9}'

Create a simple script that will be accessed from snmpd.conf. Note the 4th field is captured from the line “Swap”.

swap.sh

#!/bin/sh

top -S -l 1 > /tmp/swap

grep Swap /tmp/swap | awk '{print $4}'

Official Nagios Training

Preview Our Live Nagios Class

Contact Us

BeginLinux.com
(407)-620-4092
mike at beginlinux.com