History | Log In     View a printable version of the current page.  
Issue Details (XML | Word | Printable)

Key: CIB-1112
Type: Bug Bug
Status: Resolved Resolved
Resolution: Fixed
Priority: Critical Critical
Assignee: jason
Reporter: Stephen Ng
Votes: 0
Watchers: 0
Operations

If you were logged in you would be able to see more operations.
Pulse

Agent process exiting wihtout warning.

Created: 19/Jun/07 05:11 AM   Updated: 11/Jul/07 12:58 PM
Component/s: Agent
Affects Version/s: 1.2.27
Fix Version/s: 1.2.30


 Description  « Hide
Stephen is having problems with some of his agents dying. This is starting to happen with increased frequency. At the moment there is very little to work with since pulse is not generating any output during this 'death'.

Options:
a) logging is failing to write error messages
b) the main thread that is keeping pulse alive is exiting abnormally.
c) pulse is shutting down


We need more information / logging to work out what is going on.

 All   Comments   Change History      Sort Order:
Daniel Ostermeier - 19/Jun/07 05:12 AM
Updated logging in 3347, will be available for the next release.

To activate improved logging, the following configuration changes are required on the slave / agent.

logging.properties:

change console (or file) logging from WARNING to INFO

consoleHandler.level=INFO

logging.default.properties:

uncomment the following;

# debug logging for agent shutdowns
# com.zutubi.pulse.slave.level=FINEST

and change the jetty logging level from WARNING to INFO

org.mortbay.level=INFO


These changes will provide extra logging around the central threads / processes within the agent, as well as logging around the startup/shutdown interface. Hopefully these changes will provide further clues as to how the agent is dying quietly.

Daniel Ostermeier - 19/Jun/07 05:12 AM
From Christian:

We have a Linux agent also showing this problem but this is not consistent, ie. it'll work ok for many days in a row and then start crashing for a couple of night builds. This machine is dedicated to pulse so its not like something else may impact it.

While investigating and solving this problem it would be useful to ping agents through the remote API (not available as far as I can tell from the online doc) so at least a monitoring script can babysit this agent and restart it when needed.

My only other solution right now would be to have a project whose only task is to start dummy builds just to check than an agent is online, restart the agent in case of a error/failure and then trigger the build we want if agents are OK. This would work but it's too kludgy to my taste :-(

Daniel Ostermeier - 19/Jun/07 05:12 AM
From Jason:

Hi Christian,

This one is still giving us the slip, so any further information you could provide would be much appreciated. Do you get any diagnostic indicating why the agent may have died? One thing to check, if you are running as a service/daemon, is if the wrapper is killing the JVM because it thinks that it has hung. If so, messages will appear in the file $PULSE_HOME/logs/wrapper.log. I don't believe this is what Stephen has been seeing, but it has come up with another installation recently.

Regarding monitoring via the remote API, there is actually a function but the documentation was missing! I have added a page now:

http://confluence.zutubi.com/display/pulse0102/RemoteApi.getAgentStatus

Although this should only be a workaround because we need to find out why these agents are dying!

Daniel Ostermeier - 19/Jun/07 05:12 AM
From Christian:

Hey Jason,

    I agree with your workaround comment and this is the way I intend to treat it. Unfortunately our pulse agents are manually started since they just run forever once started... except for this one mandrake box.

I hope that the restart scripts will provide some insight as to when the agent dies (we only know now it never dies during a build).

Thanks for the updated documentation.

jason - 30/Jun/07 08:03 AM
Added a candidate fix in change 3436.

jason - 11/Jul/07 12:57 PM
The candidate fix in 1.2.30 appears to have worked.