Ticket #114 (closed defect: fixed)

Opened 4 years ago

Last modified 7 months ago

No data received from host

Reported by: Zanniebal Owned by: mickem
Priority: 1 Milestone: 0.3.8
Component: CheckSystem Version: 0.3.0
Severity: Bugs Keywords:
Cc:

Description (last modified by mickem) (diff)

I'm running the latest RC's on Windows 2003 servers.

Sometime nagios reports that it isn't receiving info from the host. In the nsclient log I see the following errors:

2007-12-05 16:21:01: error:.\PDHCollector.cpp:159: Failed to query performance counters: \Processor(_total)\% Processor Time: PdhGetFormattedCounterValue failed: -2147481642: A counter with a negative denominator value was detected.

2007-12-05 16:28:41: error:.\PDHCollector.cpp:208: Failed to get Mutex!
2007-12-05 16:28:43: error:.\PDHCollector.cpp:247: Failed to get Mutex!
2007-12-05 16:28:48: error:.\PDHCollector.cpp:234: Failed to get Mutex!

Can you bugfix it?!

Attachments

NSC.ini Download (6.7 KB) - added by Zanniebak 4 years ago.

Change History

comment:1 Changed 4 years ago by mickem

  • Status changed from new to assigned
  • Description modified (diff)

HUmm, would be interesting to know what you did to get that.

The mutex problem indicates two "instances" are accessing the same "code segment". This is a long shot but the only time when I can think this might happen "normally" is if you query tings *just* when the process is "booting" is has yet to startup.

But the first error happens almost 7 minutes before which indicates that is not the case. I have looked into the "first" error and not many hits on it but seems to be related to counter values mis match (which maybe due to checking "to often") which brings me to the second reason the "mutex" problem may "happen" and that would be if you poll to "quickly" but this is doubtful.

Feel free to past both the nsc.ini and let me know any details to when/how you get the error.

MickeM

comment:2 Changed 4 years ago by mickem

I just upped a nightly (check under nightly) that 1, adds an error message for this and then retries 1 second later, would be interesting to see what that does. (but don't expect any miracles)

MickeM

comment:3 Changed 4 years ago by Zanniebal

Allright, I installed the nightly on our main servers. I'll check the log in a couple of hours...

comment:4 Changed 4 years ago by Zanniebal

Found something on one of our terminalservers:

2007-12-06 09:44:18: error:d:\Documents and Settings\mickem\Mina dokument\Visual Studio 2005\Projects\NSCP\trunk\include\PDHCounter.h:178: We got '-2147481642: A counter with a negative denominator value was detected.' so we wait 1 second an try again.

2007-12-06 09:44:19: error:.\PDHCollector.cpp:159: Failed to query performance counters: \Processor(_total)\% Processor Time: PdhGetFormattedCounterValue failed: -2147481642: A counter with a negative denominator value was detected.

Need anything more? Thnx in advance!

comment:5 Changed 4 years ago by anonymous

polling frequency (ie. nsc.ini) but I am pretty stumped... that was my one "idea".

Is there any pattern to it, ie, TS machines, or a certain SP or something?!?! But I shall look into it, in the meantime you can try to switch to use WMI for CPU load checks.

MickeM

comment:6 Changed 4 years ago by Zanniebal

Aha, well it doesn't happen to much :)

I will keep checking the logs to see for a patern and report if I found one!

I enclosed the nsc.ini file...

Changed 4 years ago by Zanniebak

comment:7 Changed 4 years ago by anonymous

  • Milestone changed from 0.3.0 to 0.4.0

Humm, could be worth trying to change the "poll frequency" to something higher just for kicks :)

CheckResolution?=100 (means you get a sample every 10 seconds instead of every second...

But I think I shall have to move this to 0.4.0 and see if I can work it out at the moment I don't really know what could be wrong. Seems to "other" who has the same "problem" with other programs, but it should not I think affect the outcome of the actual results in nagios? As it is only the collection thread. But still... not nice...

MickeM

comment:8 Changed 4 years ago by anonymous

  • Component changed from Core to CheckSystem

comment:9 Changed 4 years ago by Zanniebal

Today all my log's are spammed with this error... Do you know a little bit more already?!

2007-12-11 06:08:41: error:d:\Documents and Settings\mickem\Mina dokument\Visual Studio 2005\Projects\NSCP\trunk\include\PDHCounter.h:178: We got '-2147481642: A counter with a negative denominator value was detected.
 
' so we wait 1 second an try again.
2007-12-11 06:08:42: error:.\PDHCollector.cpp:159: Failed to query performance counters: \Processor(_total)\% Processor Time: PdhGetFormattedCounterValue failed: -2147481642: A counter with a negative denominator value was detected.
 

2007-12-11 06:08:43: error:d:\Documents and Settings\mickem\Mina dokument\Visual Studio 2005\Projects\NSCP\trunk\include\PDHCounter.h:178: We got '-2147481642: A counter with a negative denominator value was detected.
 
' so we wait 1 second an try again.
2007-12-11 06:08:44: error:.\PDHCollector.cpp:159: Failed to query performance counters: \Processor(_total)\% Processor Time: PdhGetFormattedCounterValue failed: -2147481642: A counter with a negative denominator value was detected.
 

2007-12-11 06:08:45: error:d:\Documents and Settings\mickem\Mina dokument\Visual Studio 2005\Projects\NSCP\trunk\include\PDHCounter.h:178: We got '-2147481642: A counter with a negative denominator value was detected.
 
' so we wait 1 second an try again.
2007-12-11 06:08:46: error:.\PDHCollector.cpp:159: Failed to query performance counters: \Processor(_total)\% Processor Time: PdhGetFormattedCounterValue failed: -2147481642: A counter with a negative denominator value was detected.
 

2007-12-11 06:08:47: error:d:\Documents and Settings\mickem\Mina dokument\Visual Studio 2005\Projects\NSCP\trunk\include\PDHCounter.h:178: We got '-2147481642: A counter with a negative denominator value was detected.
 
' so we wait 1 second an try again.
2007-12-11 06:08:48: error:.\PDHCollector.cpp:159: Failed to query performance counters: \Processor(_total)\% Processor Time: PdhGetFormattedCounterValue failed: -2147481642: A counter with a negative denominator value was detected.
 

2007-12-11 06:08:49: error:d:\Documents and Settings\mickem\Mina dokument\Visual Studio 2005\Projects\NSCP\trunk\include\PDHCounter.h:178: We got '-2147481642: A counter with a negative denominator value was detected.
 
' so we wait 1 second an try again.
2007-12-11 06:08:50: error:.\PDHCollector.cpp:159: Failed to query performance counters: \Processor(_total)\% Processor Time: PdhGetFormattedCounterValue failed: -2147481642: A counter with a negative denominator value was detected.
 

2007-12-11 06:08:51: error:d:\Documents and Settings\mickem\Mina dokument\Visual Studio 2005\Projects\NSCP\trunk\include\PDHCounter.h:178: We got '-2147481642: A counter with a negative denominator value was detected.
 
' so we wait 1 second an try again.
2007-12-11 06:08:52: error:.\PDHCollector.cpp:159: Failed to query performance counters: \Processor(_total)\% Processor Time: PdhGetFormattedCounterValue failed: -2147481642: A counter with a negative denominator value was detected.
 

2007-12-11 09:37:20: error:.\PDHCollector.cpp:247: Failed to get Mutex!
2007-12-11 09:37:24: error:.\PDHCollector.cpp:208: Failed to get Mutex!
2007-12-11 09:37:24: error:.\PDHCollector.cpp:222: Failed to get Mutex!
2007-12-11 09:37:25: error:.\PDHCollector.cpp:234: Failed to get Mutex!

comment:10 Changed 18 months ago by mickem

  • Status changed from assigned to closed
  • Resolution set to fixed
  • Milestone changed from 0.4.0 to 0.3.8

If this issue is still present try the new thread_safe PDH implementation.

comment:11 Changed 17 months ago by lovedada

We're seeing this occasionally on 0.3.7. Should we upgrade to 0.3.8 to fix this ?

comment:12 Changed 17 months ago by mickem

Humm, this issue is still "unresolved" there was some work in 0.3.8 wich "might" resolve them but as of yet this is unconfirmed.

If you are interested in testing this you can let me know and I can give you some headsup on what you can try.

But again, out of the box an upgrade would probably not resolve this perticular issue.

Michael Medin

comment:13 Changed 7 months ago by mickem

  • Version changed from 0.3.0-RC to 0.3.0
Note: See TracTickets for help on using tickets.