This script may not be able to correctly clean the orphan processes you want to remove. I recommend giving LAM/MPI a try instead of MPICH.
To clean-up the process left after jobs exiting the nodes, an epilogue script is a convenient choice. Here is an example (although this example is not compatible with all scenarios) for Torque in OSCAR 4.x package:
#!/bin/sh # Please notice that ALL processes from $USER will be killed (!!!) echo '--------------------------------------' echo Running PBS epilogue script # Set key variables USER=$2 NODEFILE=/var/spool/pbs/aux/$1 PPN=`/bin/sort $NODEFILE | /usr/bin/uniq | /usr/bin/wc -l` if [ "$PPN" = "1" ]; then # only one processor used echo Done. #su $USER -c "skill -v -9 -u $USER" else # more than one cpu used echo Killing processes of user $USER on the batch nodes for node in `cat $NODEFILE` do echo Doing node $node su $USER -c "ssh -a -k -n -x $node skill -v -9 -u $USER" done echo Done. fi
1 comment:
Actually this epilogue is considered harmful, please just use as an example.
Post a Comment