Wednesday, December 14, 2011

Cacti > 1 minute polling - Drawback

Generally, using Cacti's 1 minute polling can be a great improvement in monitoring your network. Imagine that you can monitor hundreds of host and get their status every minute without exploiting the use of realtime plugin. You can also determine what particular minute did your host change its status from up-to-down and vice-versa. This is also very beneficial in responding for faults quickly, if the host is down in the first minute, you don't need to wait for another 4 minutes just to know that your host is already down. This 4 minutes can be long enough to resolve a minor connection problem. Moreover, 1 minute polling can detect intermittent connection better the 5-minute polling, more often, 5-minute polling can't be an accurate basis for troubleshooting an intermittent connection.

Unfortunately, one critical instance in monitoring every minute is that when most (hundreds or so...) of your host were down, cacti's poller will took longer to poll downed host, and polling won't be completed in just 1 minute, this will cause gaps on some of your graphs where it is not supposed to. The poller will again starts the polling cycle at the end of every 60 seconds without completely polling other host especially those with higher device IDs.

My current fix is to increase the Spine's maximum threads per process by 50% from my current settings, I hope this will be enough to completely poll all host when a major downtime happens again. Cacti's poller stats runtime is down to only less than 15sec from its previous 25sec.

No comments:

Post a Comment