Steve Shipway
2010-10-03 22:18:26 UTC
Has anyone on the list any thoughts on the MRTG check scheduler?
Currently (we're considering Daemon mode only, here), every 5 mins it will
run ALL the Target checks sequentially, running multiple threads according
to the Forks: directive. After all checks are finished, it will sleep until
the next 5-min cycle starts.
This is sub-optimal because
1) You get a huge burst of CPU usage followed by a period of silence,
which can make the frontend go slow and messes up monitoring of the system's
own CPU
2) If the checks exceed the 5min window, then you miss a polling cycle
and need to manually tune your forks upwards.
I would propose an alternative method of scheduling.
1. Rather than specifying a number of forks, make it a MAXIMUM number
(a bit like when defining threads in apache)
2. After the initial read of the CFG files, MRTG knows how many
Targets there are. Divide the Interval by this to get the interleave. Then,
start a new check every interleave, starting a new thread if necessary and
if we've not hit the maximum threads.
Benefits would be that it can expand to handle more targets, and spreads the
load over the window.
Disadvantages would be that it is hard to tell when you're reaching
capacity, and (more importantly) it might be hard to do the optimisation
that MRTG does where a single device is queried once for all interfaces.
We coded up basically this system here, however it didn't use MRTG in daemon
mode which negates a lot of the benefits you can gain from daemon mode and
the new RRD memory-mapped IO. I've not yet looked at coding it directly
into the MRTG code.
Anyone have any thoughts?
Steve
_____
Steve Shipway
***@steveshipway.org
Routers2.cgi web frontend for MRTG/RRD; NagEventLog Nagios agent for Windows
Event Log monitoring; check_vmware plugin for VMWare monitoring in Nagios
and MRTG; and other Open Source projects.
Web: http://www.steveshipway.org/software
P Please consider the environment before printing this e-mail
Currently (we're considering Daemon mode only, here), every 5 mins it will
run ALL the Target checks sequentially, running multiple threads according
to the Forks: directive. After all checks are finished, it will sleep until
the next 5-min cycle starts.
This is sub-optimal because
1) You get a huge burst of CPU usage followed by a period of silence,
which can make the frontend go slow and messes up monitoring of the system's
own CPU
2) If the checks exceed the 5min window, then you miss a polling cycle
and need to manually tune your forks upwards.
I would propose an alternative method of scheduling.
1. Rather than specifying a number of forks, make it a MAXIMUM number
(a bit like when defining threads in apache)
2. After the initial read of the CFG files, MRTG knows how many
Targets there are. Divide the Interval by this to get the interleave. Then,
start a new check every interleave, starting a new thread if necessary and
if we've not hit the maximum threads.
Benefits would be that it can expand to handle more targets, and spreads the
load over the window.
Disadvantages would be that it is hard to tell when you're reaching
capacity, and (more importantly) it might be hard to do the optimisation
that MRTG does where a single device is queried once for all interfaces.
We coded up basically this system here, however it didn't use MRTG in daemon
mode which negates a lot of the benefits you can gain from daemon mode and
the new RRD memory-mapped IO. I've not yet looked at coding it directly
into the MRTG code.
Anyone have any thoughts?
Steve
_____
Steve Shipway
***@steveshipway.org
Routers2.cgi web frontend for MRTG/RRD; NagEventLog Nagios agent for Windows
Event Log monitoring; check_vmware plugin for VMWare monitoring in Nagios
and MRTG; and other Open Source projects.
Web: http://www.steveshipway.org/software
P Please consider the environment before printing this e-mail