mpiexec hangs on application exit on torque 4.1.3.
djohnson at osc.edu
Sun Nov 11 10:51:34 EST 2012
At Sun, 11 Nov 2012 02:42:56 +0100,
Roy Dragseth wrote:
> I've started to test the latest release of torque and found that it is almost
> usable. The only thing I found problematic so far is that mpiexec hangs on
> application exit. It seems to execute the mpi-application correctly, but it
> never exits unless I hit ctrl-c twice. It doesn't matter if I start a mpi job
> or just execute commands through -comm none.
> contains a debug session. Any hints on how to proceed on this?
Torque 4 has a bug in tm_poll and there are open bug reports with
Adaptive. Internally, the call returns a spurious TM_ENOTCONNECTED
when processes exit normally. The other behavior that appears to be
caused by this bug is that the output of only the first 15 ranks are
blackbox3:~/mic/mpiexec-mic> ./mpiexec -comm none echo \$MPIEXEC_RANK
I don't think there's a reliable way to work around this bug in
mpiexec. I've tested whether this behavior still exists in the 4.2
beta branch, and unfortunately the bug is still there. I'll followup
in the next few days.
More information about the mpiexec