Contributed by deanna on from the dept.
Over the weekend, one of our OpenBSD servers, an internet router running OpenBGPD, had a fan die. Thanks to the sensors framework, and the Nagios plugin I wrote, I found out it was broken, and I could also tell that the rest of the fans in the server were doing a fine job keeping it cool. That means I was able to replace the fan at my convenience. Without the monitoring, I would probably not have noticed the fan being out until more fans died and the server overheated and failed.
After this, I got excited about the sensors again and updated the check with the ability to check the sensors that report their status. Since many sensors support this, it can make the size of your sensorsd.conf much smaller. For example, check_hw_sensors will automatically check these two sensors:
hw.sensors.76=esm0, Fan 4, OK, fanrpm, 3629 RPM hw.sensors.77=esm0, Fan 5, CRITICAL, fanrpm, 0 RPMIt will report the status listed to Nagios. For 76, it would report OK, for 77 it would report CRITICAL. You don't need to put anything in a config file to support those.
I have this check running on 10 servers with a variety of different hardware, checking a total of 273 sensors. It sure makes me sleep better knowing that if something breaks, I will get a text message on my cell phone letting me know.
The variety of hardware includes:
- 2 ISP1100
- 5 Dell PowerEdge 2450
- 1 Dell PowerEdge 4300
- 1 Dell PowerEdge 6450
- 1 Whitebox Celeron 300
(Comments are closed)