NSCA hangs forever while sending data to server

Forums NSClient++ support NSCA hangs forever while sending data to server

This topic contains 21 replies, has 2 voices, and was last updated by  Legacy Forum User 4 years, 10 months ago.

Viewing 15 posts - 1 through 15 (of 22 total)
  • Author
    Posts
  • #1679

    …and by “forever,” I actually mean forever. It stops sending results even though the agent process continues to run.

    We have an installation of the most recent version of NSClient++ (0.3.6 stable) running on a few dozen servers, and most of them work fine, including some 64-bit and some 2008 servers. However, there is one in particular where the agent stops reporting for no apparent reason after only a day or so. The monitoring server is watching for passive checks only, and so it thinks the server is down even though it’s running just fine. This is obviously not ideal.

    Here’s an excerpt from the log on the server with the agent:

    2009-06-17 06:06:21: debug:modules\NSCAAgent\NSCAThread.cpp:245: Sending to server…
    2009-06-17 06:06:21: debug:modules\NSCAAgent\NSCAThread.cpp:252: Looked up [HOST] to [IP]
    2009-06-17 06:06:25: error:modules\DebugLogMetrics\PDHCollector.cpp:216: Failed to query performance counters: PdhCollectQueryData failed: : -2147481643: No data to return.
    2009-06-17 11:28:26: debug:NSClient++.cpp:753: No shared session: ignoring change event!

    (host/IP info hidden for security, but it does resolve correctly)

    The second line (“Looked up …”) is the last real NSCA activity in the log. The next line about performance counters is repeated very, very often throughout the log, both before and after it stops sending results. This indicates to me that the service is still running even (in addition to the fact that the Services management console also shows the same thing). The last line shows up rarely, but every so often, after it stops sending results. It never executes any more checks, and never tries to send any results.

    I looked at the NSCAThread.cpp code for any help, and nothing jumped out at me. I’m unfamiliar with the socket code. Is there any way it could be blocking somehow, attempting a connection with no timeout value in such a way that it never stops trying? Any other possible lock/hang points?

    Help!

    Jeff

    #7619

    I have same problem here:

    http://nsclient.org/nscp/discussion/topic/357

    I think the error message comes when you log in to the server.

    Restarting nsclientpp service resolves the problem temporarily.
    I tried nc_net with my setup and experienced same kind of problems, so I’m not sure if this is nsclient++ problem?

    #7621

    Michael Medin
    Keymaster

    What does the debug log say?

    // MickeM

    #7622

    Last lines are always:
    {{{
    2009-07-08 21:47:27: debug:modules\NSCAAgent\NSCAThread.cpp:182: Sending to server…
    2009-07-08 21:47:27: debug:modules\NSCAAgent\NSCAThread.cpp:189: Looked up xxx.xxx.xxx.xxx to xxx.xxx.xxx.xxx
    }}}

    And nothing after that.
    Checks may work only 2 hours or they may work for a week, but eventually all servers stop sending.
    I tried also with servers in the same network where nagios server is, same result.

    My active servers work 100% with nsclient++
    I have 0.3.6 clients on win2003 32bit servers. Same was with 0.3.5.

    #7623

    Michael Medin
    Keymaster

    If you are interested I could hook you up with a build which logs more and wee can see if you can help me track down the problem…

    // MickeM

    #7624

    Michael Medin
    Keymaster

    After browsing the code I think the problem is reading the input package that will (I think) read and read and read untill done so if it never gets done it will never finnish.

    But it is just a theory so I would need to verify it (and hopefully fix it)

    // MickeM

    #7625

    Yes, I can do that with few servers.

    #7630

    Hi,

    i’ve about 20 servers in my setup right now, and all stop sending passive checks at some point. all using the 0.3.6 nsclient++ service.
    Is this problem also in older version?

    Thx,

    Leon

    #7632

    Yes, I have tried two older versions, same problem.

    #7635

    Michael Medin
    Keymaster

    If this is as I think it will be present for all versions of NSClient++ and possibly affect other parts as well (as for instance the NRPE parts).

    I shall see if I can do a work around for this in the next version (will be out after the weekend as nightly) but for the 0.4.x branch there will be a new socket subsystem which I hope solves this issue permanently…

    // Michael Medin

    #7641

    Michael Medin
    Keymaster

    Check now, hopefully fixed in the latest nightly build…

    // Michael Medin

    #7644

    Thanks.

    I installed it on 8 servers, and will keep you informed.

    Mikko

    #7650

    I also have installed it on our problem server, and I’ll report back about whether it fails again or seems to be stable. Thanks!

    #7652

    Running the nightly build for 24 hours without problems. I now installed on 10 more servers on different site.

    Btw, I had problems in this other domain with the 0.3.7 msi installer, almost all servers failed with error message:
    1: Failed to install firewall exception: get_LocalPolicy failed: -2147023143: There are no more endpoints available from the endpoint mapper

    I did 0.3.6 install and copied nightly from .zip over it, that works ok.

    #7653

    Michael Medin
    Keymaster

    Yes the copy works.

    And the issue is related to a disabled (?) windows firewall, the new installer features a windows firewall “add exception thingy” but it is bleeding edge so not 100% ironed out.

    One thing!

    on the servers “which work” check the nsc.log file and check for any NSCA related errors (the problem should I think now manifest it self as an error)

    // Michael Medin

Viewing 15 posts - 1 through 15 (of 22 total)

You must be logged in to reply to this topic.