Idea for alternative MRTG scheduler

Discussion:

Steve Shipway

2010-12-16 22:36:38 UTC

Here's an idea.

Currently, MRTG will process all Targets until there are none left, using up all the available threads, and then sleep until the next polling cycle.

This can be problematic if you configure more targets, or there is an outage, and suddenly you do not have enough threads to process everything in the 5min window. Also, it results in a large burst of activity at the start of the window followed by silence.

So, how about this - MRTG already knows how many targets there are, and the interval. It calculates x=(interval/#targets)x0.9 (the 0.9 is to allow time for the final checks to complete) and then kicks off a new Target to process every x seconds, starting a new thread if required (possibly up to a specified upper limit). This would possibly end up with each thread processing a single target and then exiting, with the master starting a new thread per Target.

I think this may be how the Nagios check scheduler works? It would certainly solve the problems of (a) uneven CPU usage and (b) running out of window time when you add more targets but not more threads. The drawback is that, of course, you need to have sufficient CPU/memory to handle the potentially large number of threads that could result.

Since Tobi is currently in a coding mood, I thought it best to get the suggestions in quick :)

Steve

________________________________
Steve Shipway
ITS Unix Services Design Lead
University of Auckland, New Zealand
Floor 1, 58 Symonds Street, Auckland
Phone: +64 (0)9 3737599 ext 86487
DDI: +64 (0)9 924 6487
Mobile: +64 (0)21 753 189
Email: ***@auckland.ac.nz<mailto:***@auckland.ac.nz>
P Please consider the environment before printing this e-mail

Tobias Oetiker

2010-12-17 07:23:12 UTC

Permalink

Hi Steve,

Post by Steve Shipway
Here's an idea.
Currently, MRTG will process all Targets until there are none
left, using up all the available threads, and then sleep until
the next polling cycle.
This can be problematic if you configure more targets, or there
is an outage, and suddenly you do not have enough threads to
process everything in the 5min window. Also, it results in a
large burst of activity at the start of the window followed by
silence.
So, how about this - MRTG already knows how many targets there
are, and the interval. It calculates x=(interval/#targets)x0.9
(the 0.9 is to allow time for the final checks to complete) and
then kicks off a new Target to process every x seconds, starting
a new thread if required (possibly up to a specified upper
limit). This would possibly end up with each thread processing a
single target and then exiting, with the master starting a new
thread per Target.
I think this may be how the Nagios check scheduler works? It
would certainly solve the problems of (a) uneven CPU usage and
(b) running out of window time when you add more targets but not
more threads. The drawback is that, of course, you need to have
sufficient CPU/memory to handle the potentially large number of
threads that could result.

I agree scheduling could be improved. Your aproach asumes an even
delay with all targets I guess ... so if there is a slow target at
a late stage of the spread out polling activity it would not have
enough time to complete unless every target is run acynchronously
causing a large horde of processes or threads.

The motivation for your suggesting seems to be to not overwhelm
devices or networks with intense polling, so I guess one aproach
would be to implement some sort of polling rate limit which makes
sure mrtg does not 'kill' anyone by being to hasty.

Post by Steve Shipway
Since Tobi is currently in a coding mood, I thought it best to
get the suggestions in quick :)

I only processe the mrtg backling ... no clever new stuff from my
end ... and I am intending todo the same for rrdtool now ... and
then publish 1.4.5.

a bugtracker with a backlog is such a sad thing ...

cheers
tobi

Post by Steve Shipway
Steve
________________________________
Steve Shipway
ITS Unix Services Design Lead
University of Auckland, New Zealand
Floor 1, 58 Symonds Street, Auckland
Phone: +64 (0)9 3737599 ext 86487
DDI: +64 (0)9 924 6487
Mobile: +64 (0)21 753 189
P Please consider the environment before printing this e-mail

--
Tobi Oetiker, OETIKER+PARTNER AG, Aarweg 15 CH-4600 Olten, Switzerland
http://it.oetiker.ch ***@oetiker.ch ++41 62 775 9902 / sb: -9900

S Shipway

2010-12-17 07:43:10 UTC

Permalink

The reason I thought the scheduling interval should be multiplied by 0.9 was to allow 10% of the interval free at the end for the last targets to complete. Obviously this assumes that no target will take >10% of the time, which would be 30s in a normal 5min interval. The other possibility is to make the window completely sliding; so it doesnt matter if the poll overruns the interval, it just gets stored into the next interval. This would probably not work so well with the existing scheduler, though.

You could think of the existing Forks: directive as an indicator of the maximum polling rate; the problem is if this is not high enough to handle the number of defined targets, the system won't automatically increase it (but maybe it could? If the polling cycle doesnt complete within the interval, automatically increase Forks: by one until it reaches some other defined limit?)

I did write an alternative scheduler for MRTG using shellscript that used this algorithm; the disadvantage was that it respawned MRTG for each cfg file individually, so it was not efficient with resources, and still had the problem of individual targets within a single cfg file having to schedule together.

Steve

Steve Shipway
University of Auckland ITS
UNIX Systems Design Lead
***@auckland.ac.nz<mailto:***@auckland.ac.nz>
Ph: +64 9 373 7599 ext 86487

--
View this message in context: http://mrtg-mailinglists.795376.n2.nabble.com/Idea-for-alternative-MRTG-scheduler-tp5843836p5844683.html
Sent from the MRTG Developers Mailinglist mailing list archive at Nabble.com.