Monitoring OSX and Linux with Nagios and SNMP |
Server - Nagios |
OSX OSX is somewhat unique both in the fact that this is not a common server which is monitored and also your actually working with two operating systems, FreeBSD under the hood and OSX built on top. However, these examples could also be used for Linux distros with a few modifications. SNMP is selected for creating the checks because it is the easiest to deploy with minimal time on the servers to be monitored.
Configure SNMP on Client SNMP must be configured on the client to allow Nagios to monitor. Even though SNMP is considered an agentless method of monitoring it does require that configuration, and possible installation of SNMP is required. In this example, as in most cases, SNMP has already been installed but is not configured nor is it actually running. The first step is to configure SNMP to allow access to the OIDs on the box. Verify that SNMP is installed: ls /usr/sbin/snmpd
Note the snmpd daemon will use port 161 UDP by default when it is started. Configuration of snmpd can be achieved with the interactive command. Do not be intimidated by the number of entries that are required, simply work your way through this as it is not as difficult as it seems. sudo snmpconf -g basic_setup
Or create a file manually, here are the basics of what you will need:
##### SNMP Configuration ##### ##### Community String ##### com2sec notConfigUser 192.168.5.0/24 public ##### Security Name ##### group notConfigGroup v1 notConfigUser group notConfigGroup v2c notConfigUser ##### View of Tree ##### view all included .1 80 ##### access notConfigGroup "" any noauth exact all none none Save this file and restart the snmpd daemon.
Starting SNMPEach time changes are made the snmpd must be restarted for those changes to be in effect. Here is an example using load and unload to start and stop the snmpd daemon.
sudo launchctl load -w /System/Library/LaunchDaemons/org.net-snmp.snmpd.plis sudo launchctl unload -w /System/Library/LaunchDaemons/org.net-snmp.snmpd.plist
On CentOSservice snmpd restart SNMP Commands These plugins can all be found at:
http://exchange.nagios.org/directory/Plugins
check_snmp_storage.pl Checking storage is somewhat different on OSX than on a Linux box. The mounted partitions are still available with the df command but note the mount points are different so take those into account.
df -h Filesystem Size Used Avail Capacity Mounted on /dev/disk0s2 465Gi 8.6Gi 456Gi 2% / map auto_home 0Bi 0Bi 0Bi 100% /home /dev/disk1s2 465Gi 427Mi 465Gi 1% /Volumes/Secondary HD
ExamplesCheck the commands that you want to use with SNMP from the command line first to verify that they are working like expected. To perform the checks go to the plugin directory and execute the checks as the nagios user.
su – nagios cd /usr/local/nagios/libexec
Check storage with "-m" as the partition indicator. ./check_snmp_storage.pl -H 66.219.169.70 -C osxro -m /home -w 80 -c 90 /home: 0%used(0MB/0MB) (<80%) : OK
The “-r” option can be used to eliminate other partitions. In this example the / partition is the only one in the return.
./check_snmp_storage.pl -H 66.219.169.70 -C osxro -m / -r -w 80 -c 90 /: 2%used(8803MB/476120MB) (<80%) : OK
check_snmp_process.pl SNMP can be used to monitor individual processes and the resources they can consume. One of the aspects that needs to be tested is what the typical needs of a process are so that both a WARNING state and a CRITICAL state can be created.
ExamplesTest all of the checks before they are created as OSX will have some issues with normal checks that work with Linux. To perform the checks go to the plugin directory and execute the checks as the nagios user.
su – nagios cd /usr/local/nagios/libexec
snmpd This example of snmpd tests to see if at least one process is running.
./check_snmp_process.pl -H 66.219.169.70 -C osxro -n snmpd 1 process matching snmpd (> 0)
This check verifies that snmpd is using less than 5 MB of memory (typical usage) to be in an OK state and less than 8 MB of memory to stay out of a CRITICAL state. ./check_snmp_process.pl -H 66.219.169.70 -C osxro -n snmpd -m 5,8 1 process matching snmpd (> 0), Mem : 4.1Mb OK
syslogd The example of monitoring the syslogd is that it typically uses less memory than the snmpd so the WARNING and CRITICAL states should be adjusted accordingly. ./check_snmp_process.pl -H 66.219.169.70 -C osxro -n syslogd -m 1,3 1 process matching syslogd (> 0), Mem : 0.8Mb OK
check_snmp_load.pl Certainly the load on a server needs to be checked as this is one of the bottlenecks that can cause problems. This can be effectively achieved with SNMP.
ExamplesThe “-T” option is an important option to provide so that the check understands the nuances of the operating system. In this example, “-T netsl” is used which employs the typical Linux method of testing load using 1 minute, 5 minute and 15 minute averages. That is reflected in the output. ./check_snmp_load.pl -H 66.219.169.70 -C osxro -w 8,7,6 -c 10,9,9 -T netsl Load : 0.00 0.00 0.00 : OK
check_snmp_int.pl Monitoring a network interface for bandwidth and status can be done with this plugin. The plugin also allows for the checking of errors on the interface.
TestingThis example includes the in and out octets for bandwidth and the speed of the network port. ./check_snmp_int.pl -H 66.219.169.70 -C osxro -n en0 -f -S en0:UP:1 UP: OK | 'en0_in_octet'=34437306c 'en0_out_octet'=29511027c 'en0_speed_bps'=100000000
Measurement of bandwidth can be achieved with the “-k” option. Here the WARNING state is more than 400 Kbps and less than 600 Kbps and the CRITICAL state is no traffic or more than 700 Kbps. ./check_snmp_int.pl -H 66.219.169.70 -C osxro -n en0 -k -w 400,600 -c 0,700 en0:UP (0.3KBps/0.5KBps):1 UP: OK
Bandwidth on “en0” is monitored in this example, but what is different is the measurement in Mbps instead of Kbps which is used if no measurement is set. Gigabits can also be used with the “-G” option. ./check_snmp_int.pl -H 66.219.169.70 -C osxro -n en0 -k -w 400,600 -c 0,800 -M en0:UP (0.0MBps/0.0MBps):1 UP: OK
Errors can be monitored on the port using the “-f” and “-e” options. ./check_snmp_int.pl -H 66.219.169.70 -C osxro -n en0 -f -e en0:UP:1 UP: OK | 'en0_in_octet'=34570753c 'en0_out_octet'=29727445c 'en0_in_error'=0c 'en0_in_discard'=0c 'en0_out_error'=0c 'en0_out_discard'=0c
check_snmp* This is the default plugin for SNMP. One of the skills that will need to be acquired is the ability to research using snmpget and snmpwalk as often administrators must create customized checks.
ExamplesUptime snmpget -v2c -c osxro 66.219.169.70 sysUpTime.0 DISMAN-EVENT-MIB::sysUpTimeInstance = Timeticks: (42901296) 4 days, 23:10:12.96
Now it is important to use the numerical OID and the MIB to save resources. These can be located by using the -On option. snmpget -v2c -c osxro 66.219.169.70 sysUpTime.0 -On .1.3.6.1.2.1.1.3.0 = Timeticks: (42915130) 4 days, 23:12:31.30
So instead of using the text option, like this: ./check_snmp -H 66.219.169.70 -C osxro -o sysUpTime.0 SNMP OK - Timeticks: (42776729) 4 days, 22:49:27.29 |
Use the numerical option. snmpget -v2c -c osxro 66.219.169.70 .1.3.6.1.2.1.1.3.0 DISMAN-EVENT-MIB::sysUpTimeInstance = Timeticks: (42923677) 4 days, 23:13:56.77
sysLocation
snmpget 66.219.169.70 -v2c -c osxro 1.3.6.1.2.1.1.6.0 SNMPv2-MIB::sysLocation.0 = STRING: "Data Center"
check_by_snmp The check_by_snmp plugin is a flexible plugin which will help solve difficult issues you may face with checks on a host. This plugin provides a way to execute commands on the remote server and execute scripts.
ExamplesEdit snmpd.conf, add the entry, save and restart. extend dfstuff /bin/df -h
./check_by_snmp.pl -H 66.219.169.70 -C osxro -E dfstuff Filesystem Size Used Avail Capacity Mounted on /dev/disk0s2 465Gi 8.6Gi 456Gi 2% / devfs 183Ki 183Ki 0Bi 100% /dev map -hosts 0Bi 0Bi 0Bi 100% /net map auto_home 0Bi 0Bi 0Bi 100% /home /dev/disk1s2 465Gi 427Mi 465Gi 1% /Volumes/Secondary HD
Using a ScriptIf you have scripts that would be useful, you can reference the scripts from the snmpd.conf file using extend.
Edit snmpd.conf, add the entry, save and restart. extend mem /bin/sh /usr/share/snmp/mem.sh
Create a simple script that will be accessed from snmpd.conf. mem.sh
#!/bin/sh top -l 1 | grep PhysMem: | awk '{print $10}'
There are multiple aspects of memory you may want to monitor. PhysMem (installed RAM) wired I nformation in RAM that cannot be moved to hard drive active information in RAM that has recently been used inactive information in RAM but has not be recently used used total RAM used free unused memory
VM total amount of Virtual Memory for all processes Page ins/Page outs amount of information moved between RAM and hard drive * Page out is when data must be written to disk because of limited RAM
Swap used amount of information copied to swap file
In order to use these different aspects of memory a script will need to be created that will enable access to the data. The key tool here is the command top which means you will need to cut out the data required using awk.
Capture active memory
Create a simple script that will be accessed from snmpd.conf. Note the 4th field is captured. active.sh
#!/bin/sh top -l 1 | grep PhysMem: | awk '{print $4}'
Create a simple script that will be accessed from snmpd.conf. Note the 2nd field is captured from the line “VM”. total_vm.sh
#!/bin/sh top -l 1 | grep VM: | awk '{print $2}'
Create a simple script that will be accessed from snmpd.conf. Note the 7th field is captured from the line “VM”. pageins.sh
#!/bin/sh top -l 1 | grep VM: | awk '{print $7}'
Create a simple script that will be accessed from snmpd.conf. Note the 9th field is captured from the line “VM”. pageouts.sh
#!/bin/sh top -l 1 | grep VM: | awk '{print $9}'
Create a simple script that will be accessed from snmpd.conf. Note the 4th field is captured from the line “Swap”. swap.sh
#!/bin/sh top -S -l 1 > /tmp/swap grep Swap /tmp/swap | awk '{print $4}' |