|
|
|
From Christian:
We have a Linux agent also showing this problem but this is not consistent, ie. it'll work ok for many days in a row and then start crashing for a couple of night builds. This machine is dedicated to pulse so its not like something else may impact it. While investigating and solving this problem it would be useful to ping agents through the remote API (not available as far as I can tell from the online doc) so at least a monitoring script can babysit this agent and restart it when needed. My only other solution right now would be to have a project whose only task is to start dummy builds just to check than an agent is online, restart the agent in case of a error/failure and then trigger the build we want if agents are OK. This would work but it's too kludgy to my taste :-( From Jason:
Hi Christian, This one is still giving us the slip, so any further information you could provide would be much appreciated. Do you get any diagnostic indicating why the agent may have died? One thing to check, if you are running as a service/daemon, is if the wrapper is killing the JVM because it thinks that it has hung. If so, messages will appear in the file $PULSE_HOME/logs/wrapper.log. I don't believe this is what Stephen has been seeing, but it has come up with another installation recently. Regarding monitoring via the remote API, there is actually a function but the documentation was missing! I have added a page now: http://confluence.zutubi.com/display/pulse0102/RemoteApi.getAgentStatus Although this should only be a workaround because we need to find out why these agents are dying! From Christian:
Hey Jason, I agree with your workaround comment and this is the way I intend to treat it. Unfortunately our pulse agents are manually started since they just run forever once started... except for this one mandrake box. I hope that the restart scripts will provide some insight as to when the agent dies (we only know now it never dies during a build). Thanks for the updated documentation. |
|||||||||||||||||||||||||||||||||||||||
To activate improved logging, the following configuration changes are required on the slave / agent.
logging.properties:
change console (or file) logging from WARNING to INFO
consoleHandler.level=INFO
logging.default.properties:
uncomment the following;
# debug logging for agent shutdowns
# com.zutubi.pulse.slave.level=FINEST
and change the jetty logging level from WARNING to INFO
org.mortbay.level=INFO
These changes will provide extra logging around the central threads / processes within the agent, as well as logging around the startup/shutdown interface. Hopefully these changes will provide further clues as to how the agent is dying quietly.