Monday, November 21, 2005

Epilogue Script in OpenPBS

This script may not be able to correctly clean the orphan processes you want to remove. I recommend giving LAM/MPI a try instead of MPICH.

To clean-up the process left after jobs exiting the nodes, an epilogue script is a convenient choice. Here is an example (although this example is not compatible with all scenarios) for Torque in OSCAR 4.x package:

#!/bin/sh
# Please notice that ALL processes from $USER will be killed (!!!)
echo '--------------------------------------'
echo Running PBS epilogue script

# Set key variables
USER=$2
NODEFILE=/var/spool/pbs/aux/$1
PPN=`/bin/sort $NODEFILE | /usr/bin/uniq | /usr/bin/wc -l`
if [ "$PPN" = "1" ]; then
   # only one processor used
   echo Done.
   #su $USER -c "skill -v -9 -u $USER"
else
   # more than one cpu used
   echo Killing processes of user $USER on the batch nodes
   for node in `cat $NODEFILE`
        do
        echo Doing node $node
        su $USER -c "ssh -a -k -n -x $node skill -v -9 -u $USER"
   done
   echo Done.
fi

1 comment:

Mengjuei Hsieh said...

Actually this epilogue is considered harmful, please just use as an example.