This script may not be able to correctly clean the orphan processes you want to remove. I recommend giving LAM/MPI a try instead of MPICH.
To clean-up the process left after jobs exiting the nodes, an epilogue script is a convenient choice. Here is an example (although this example is not compatible with all scenarios) for Torque in OSCAR 4.x package:
#!/bin/sh
# Please notice that ALL processes from $USER will be killed (!!!)
echo '--------------------------------------'
echo Running PBS epilogue script
# Set key variables
USER=$2
NODEFILE=/var/spool/pbs/aux/$1
PPN=`/bin/sort $NODEFILE | /usr/bin/uniq | /usr/bin/wc -l`
if [ "$PPN" = "1" ]; then
# only one processor used
echo Done.
#su $USER -c "skill -v -9 -u $USER"
else
# more than one cpu used
echo Killing processes of user $USER on the batch nodes
for node in `cat $NODEFILE`
do
echo Doing node $node
su $USER -c "ssh -a -k -n -x $node skill -v -9 -u $USER"
done
echo Done.
fi
1 comment:
Actually this epilogue is considered harmful, please just use as an example.
Post a Comment