|
|
|
[
Permlink
| « Hide
]
Stephen Nesbitt - 28/Jul/08 06:23 PM
Oh, as a further data point, I have verified that all resource requirements are met by the agent. The agent status indicates a status of "post recipe" despite nothing going on (that I can see). Previous build was a cancel. Is there a problem with cancelling builds?
Hi Stephen,
The post-recipe state is entered after the recipe is complete on an agent, but before the post-stage hooks and notifications are processed. So your agent could be in this state if you have a hook that is still running, or of course a bug. In fact these changes to agent state management are brand new so I wouldn't be surprised if there are new issues (annoyed, after all the testing for these new changes, but not surprised :) ). Do you have any post stage hooks at all? Does the previous (cancelled) build look complete, or is just the stage complete? Here is a log showing a bunch of db access errors. This is using the internal database and as far as I can tell there should not be any access problems.
I know this issue has the wrong nameand the problems are all over the map making it difficult to tell one end from the other.
So here's what I know for sure: 1) I tried to cancel a build. It didn't seem to complete after several minutes. 2) I bounced the Pulse server 3) All subsequent manual builds (there were no polled builds) ended up in the pending state despite the fact that the agent was marked idle and all resource requirements were met. 4) I cloned the agent and then disabled the agent I cloned from (the one that should have been accepting requests). Builds started flowing again. Bottom line, something really isn't right somewhere possibly involving cancellation as a triggering event. And it would really be nice to have Pulse tell me why a request is pending(other than the agent is busy) :-) Oh - aren't we having fun today.
Ok the build kicked off, but it is hanging per the above, but it is hanging. The build shows successful complete, but the only stage performed was bootstrap - no indication that the default stage was run and the pulse process is taking up 98 - 99 % of the CPU. An strace on the process id shows: futex(0x80d9134, FUTEX_WAIT, 1, NULL And other builds in the queue are just waiting around with a pending status. Nothing in the logs that I can see. Hi Stephen,
There is certainly something odd going on here. The DB access errors are the most worrying as they could lead to a host of other issues. Anyhow, a couple of things could help diagnose: - The stage logs for the build may indicate why things are pending. - A stack trace of the server should show us what it is trying to do. The easiest way to obtain one, if you have a full JDK installed, is to use jps to get the Pulse PID and jstack to print the dump: jsankey@pal:~$ jps -l 8328 com.zutubi.pulse.command.PulseCtl 5310 org.tanukisoftware.wrapper.WrapperStartStopApp 8729 sun.tools.jps.Jps jsankey@pal:~$ jstack 8328 <dump> These would be much appreciated. And given the excellent help you have given us so far beta testing 2.0 I am sure we can offer you a discount if you decide to purchase! Make sure you remind me ;). Jason:
Unfortunately I needed to move forward on getting a demo for the customer setup so I erased and re-installed. Which solved the problem at the expense of losing debugging info. Don't worry, if it can be broken, I will find a way ;-) And thanks for the appreciation. I will - or my client will - if we go with Pulse! -steve |
||||||||||||||||||||||||||||||||||||||||||||