There have been 2 scripts that have allowed me to extend Nagios more easily than almost any other monitoring configuration over the last 10 years. This has allowed me to create monitors within the applications that I have built and within existing applications. A few time it has helped me to solve complex monitoring systems where providers have provided ineffective documentation.
This article assumes that you have an understanding of Nagios.
- check_http_status.sh – allows me to write code within any URL that will return predefined strings (STATE_OK,STATE_WARNING,STATE_CRITICAL) and a message that will help to determine the error.
- check_http_content.sh – allows me to search a web page for a string. If that string does not exist then return an error.
Simple right? Does it exist? Maybe. Have I recreated something? Well maybe again. But have been using it for the last 10 years and it has stood the test of time. It is simple, easy to call, and uses existing infrastructure. Web programmers can make as many hooks as needed for the
One time I was working for a telco, and the IDSN’s connections had a tendency to drop out at the most inconvenient times. We were always on the back foot and reactivating to the problem when our customers reported it to us. How to fix? This was old equipment and there was little documentation for the SNMP traps. BUT there was a web page that would have a red light icon when the ISDN lines would have a problem. The check_http_content.sh allowed me to search for the green icon (The monitor is listed below). Within half an hour I had solved all of our ISDN monitoring issues without having to sift through endless google searches trying to find the correct SNMP trap.
The other script that has been incredibly useful (check_http_status.sh) allows me to write hooks in all of the web apps. This means that all of the complex monitoring can be part of the web application itself (DevOps?)
Pros and Cons
The downside of this is that the monitoring server adds additional load on your web server. This can be controlled by the interval configuration in Nagios. It is a small price to pay to have such an easy to monitor in your systems. Anything can be monitored from processes, database sizes, event frequency, cash flow, service tickets. Anything that you can write a program for can not be monitored in Nagios.
You have to consider the Security when you run write these scripts. It is not a problem for me as I was on a private network. You can control the access via whitelisting your monitoring server’s IP, or you can add some authentication to your scripts when you call curl.
If you need some assistance implementing this to your DevOps team, please contact us.
nagiosCheckDatabase.php – Example web hook for checking that database exists. In this case an Oracle database
<?php //nagiosCheckDatabase.php require_once ( dirname ( __FILE__ ) . '/config.php' ); $dbName = Request::get ( 'HOST' ); $tab = new DBTable ( $dbName, 'SELECT SYSDATE FROM DUAL', null, DB::FETCH_NUM ); if ( ! $tab->ok() ) { echo "Unable to query Database STATE_CRITICAL"; } else { echo "SYSDATE=" . $tab->getValue() . " - STATE_OK"; }
myservers.cfg – Example Service Configuration for Nagios for check_http_status and check_http_content
define service { use generic-service host_name sydney-mpcsyd service_description Job Results check_command check_http_status!http://192.168.3.200:8080/LiveStats/nagiosCheckJobResults.php?HOST=mpcsyd normal_check_interval 60 retry_check_interval 15 max_check_attempts 3 } define service{ use generic-service host_name sydney-rev-au-pocmp3 service_description ISDN OCMP3 check_command check_http_content!http://192.168.3.130:4242/this.BMPFFaultMgr?GetMapAction=HTML&LEVEL=TOP_LEVEL&TYPE=1&NAME=Root&DATE=0&LEV_NUM=0&LEV_NAME0=N0&LEV_NAME1=N1&LEV_NAME2=N2&LEV_NAME3=N3&LEV_TYPE0=T0&LEV_TYPE1=T1&LEV_TYPE2=T2&LEV_TYPE3=T3!greenISDNIcon.gif }
commands.cfg – This is the Nagios configuration that connects the services to the scripts
define command { command_name check_http_status command_line /etc/nagios/scripts/check_http_status.sh '$ARG1$' } define command { command_name check_http_content command_line /etc/nagios/scripts/check_http_content.sh '$ARG1$' '$ARG2$' }
check_http_status.sh
#! /bin/bash STATE_OK=0 STATE_WARNING=1 STATE_CRITICAL=2 STATE_UNKNOWN=3 STATE_DEPENDENT=4 if test -x /usr/bin/printf; then ECHO=/usr/bin/printf else ECHO=echo fi URL=$1 RESP=`curl -s --connect-timeout 300 --retry 3 --silent -f $URL` RES=$? if [ "$RES" != "0" ] then echo "Unable to connect to $URL ($RES)" exit $STATE_WARNING else echo "$URL: $RESP" if echo $RESP | grep -q STATE_OK then exit $STATE_OK elif echo $RESP | grep -q STATE_WARNING then exit $STATE_WARNING elif echo $RESP | grep -q STATE_CRITICAL then exit $STATE_CRITICAL else exit $STATE_WARNING fi fi
check_http_content.sh
#! /bin/bash STATE_OK=0 STATE_WARNING=1 STATE_CRITICAL=2 STATE_UNKNOWN=3 STATE_DEPENDENT=4 if test -x /usr/bin/printf; then ECHO=/usr/bin/printf else ECHO=echo fi URL=$1 PROCESS=$2 RESP=`curl --silent -f $URL` RES=$? if [ "$RES" != "0" ] then echo "Unable to connect to $URL ($RES)" exit $STATE_WARNING else if echo $RESP | grep -q "$PROCESS" then echo "String ($PROCESS) exists in URL: $URL" exit $STATE_OK else echo "Could not find: String ($PROCESS) in URL: $URL" exit $STATE_CRITICAL fi fi