Showing posts with label OpenPBS. Show all posts
Showing posts with label OpenPBS. Show all posts

Friday, December 17, 2010

amber11 pmemd + LAM-7,1,4/torque

The default optimization flag for pmemd.MPI is -fast, which causes some trouble in our cluster since the torque library doesn't like -static at all. You might already know that -fast is equivalent to "-xHOST -O3 -ipo -no-prec-div -static". My suggestion is to use "-axSTP -O3 -ipo -no-prec-div" instead. The reason for that is compatibility, -xHost also isn't a good optimization flag either. All processors in our cluster are not exactly the same, -xHost is just adding a possibility to mess with.

Thursday, October 16, 2008

Are you busy to receive the submissions?

A bash local command that returns "no" when the node asked was absolutely not running any job, otherwise "yes".
areyoubusy(){
njobs=$(pbsnodes -x $1 | grep \<jobs\> | sed -e 's/.*<jobs>//;s/<\/jobs\>.*$//' \
        | awk -F ',' 'END{print NF}')
if [ $njobs -gt 0 ]; then
   echo "yes"
else
   echo "no"
fi
}
# usage: areyoubusy node1.local

Monday, March 10, 2008

Things Sucked.

Memo to myself: always remember to qalter the job resources list after changing the default settings of the server.
% sudo qalter 1234.server.local -l cput=360000000
Bonus memo:
% qstat -f | grep -e Job -e cput
Job Id: 1236.raylsrvr.local
    Job_Name = heat_dip
    Job_Owner = mjhsieh@raylsrvr.local
    resources_used.cput = 01:31:38
    Resource_List.cput = 100000:00:00
%
Always check the cput from time to time, it runs faster than you thought. Don't let the system kill your long, un-restart-able running jobs.

Monday, November 21, 2005

Epilogue Script in OpenPBS

This script may not be able to correctly clean the orphan processes you want to remove. I recommend giving LAM/MPI a try instead of MPICH.

To clean-up the process left after jobs exiting the nodes, an epilogue script is a convenient choice. Here is an example (although this example is not compatible with all scenarios) for Torque in OSCAR 4.x package:

#!/bin/sh
# Please notice that ALL processes from $USER will be killed (!!!)
echo '--------------------------------------'
echo Running PBS epilogue script

# Set key variables
USER=$2
NODEFILE=/var/spool/pbs/aux/$1
PPN=`/bin/sort $NODEFILE | /usr/bin/uniq | /usr/bin/wc -l`
if [ "$PPN" = "1" ]; then
   # only one processor used
   echo Done.
   #su $USER -c "skill -v -9 -u $USER"
else
   # more than one cpu used
   echo Killing processes of user $USER on the batch nodes
   for node in `cat $NODEFILE`
        do
        echo Doing node $node
        su $USER -c "ssh -a -k -n -x $node skill -v -9 -u $USER"
   done
   echo Done.
fi

Thursday, September 01, 2005

Preventing Users From Not Using PBS Loging on Computing Nodes

Normally an experienced beowulf cluster administrator would probably suggest people not to login the computing nodes directly. However we (I myself is a user, too.) tend to connect to the computing nodes and run something on them without going through the scheduler (or resource allocator we might say.). All right, just a small job and you don't want to run it on the server because the server is often busy. The administrator would probably be mad because he or she cannot let the resource fairly accessible to the users. Therefore following command (but only for OSCAR cluster or other cluster with PBS/Torque installed) I guess is for administrators to recommend their users to run:

$ qsub -I -N "interactivejob" -S /bin/tcsh -q workq -l nodes=1:ppn=1
This will let users login the computing nodes through the scheduler.

Do we think the users will follow the rules and giving up logining into the nodes? No we are not stupid. A civilized but lazy way is to beg the users in the /etc/motd:

Please! Please do not ssh into the node! We beg you!
Of course this doesn't work on hackers. Unfortunately people usually think of themselves as hackers. So if we put following local.csh script in the /etc/profile.d/ on the nodes, you can stop the manual login thru ssh:
if ( ! $?PBS_ENVIRONMENT ) then
   if ( $?SSH_TTY && `whoami` != "root" ) then
      echo; echo please stop login the node thru ssh; echo
      logout
   endif
endif
Or, as Jenna (in #oscar-cluster @ FreeNode) pointed out, use local.sh for bash/sh users:
[ -z "$PBS_ENVIRONMENT" -a "$SSH_TTY" -a `whoami` != "root" ] && logout
This kind of design will not prevent users from using cexec, mpi, qsub or pbsqsh. However it doesn't guarantee users are absolutely not able to ssh to the nodes. If users intend to do so, the admin should use more civilized communication skills, not go into a technical fight.

As this problem going away, now we face another problem. People will just do their stuff on the server because they can't login into the nodes. And qsub is such a hassle a genius won't use it. Screw you guys, I am going home. 凸

Sunday, January 02, 2005

OpenPBS Tips

These technical notes may only apply to OSCAR 3.0 package.

  • Node crashes causing jobs staying at R(running) state
    1. Identify the job number.
    2. sudo rm /var/spool/pbs/server_priv/jobs/[that number].*
    3. sudo killall -9 pbs_server
    4. sudo service pbs_server start
  • Jobs not released and staying at E(exiting) state
    The same procedure mentioned previously still applies.
  • Pbs server CPU usage stays 100%
    sudo service pbs_server restart
  • Also check my previous article about recovering a dead node.
  • Nodes which are manually cleared/on-lined cannot run the job submitted before activation.
      If you can re-submit these jobs, that would be great, but if not, try this:
    1. Identify the queue number.
    2. sudo qrun [this number]
  • Shutting down too many nodes and leaving them in "down" state will kill your pbs_server (it will also mark active node as "down"), you need to mark this powered-off nodes as "off-line" nodes.
  • Always check the health of pbs_server/maui
  • Adjust the pbs_server log level, or it will eat up your disk.
  • When a node can only responds ping, it's experiencing a filesystem problem, you need a further reboot and inspection to determine the reason. If a job was running on it, forget it and restart and follow the steps of "jobs staying at R(running) state".
  • Another similar problem is when you logged in the node and it gave you such:
    switcher/1.0.10(85):ERROR:102: Tcl command execution failed: if { 
    $have_switcher && ! $am_removing } {
      process_switcher_output "announce" [exec switcher --announce]
    
      # Now invoke the switcher perl script to get a list of the modules
      # that need to be loaded.  If we get a non-empty string back, load
      # them.  Only do this if we're loading the module.
    
      process_switcher_output "load" [exec switcher --show-exec]
    }
    This also indicates that the node is experiencing a filesystem problem.

Wednesday, June 30, 2004

What Hangs pbs_server?

Several problems I found on OpenPBS are connection related. OpenPBS tends to connect to the nodes sequentially while c3 package (came with OSCAR) also doing the same thing. If one of the connection is hanging there too long, the whole procedure halts. Or, if the connections delay too long and exceed the overall timeout limit, the whole procedure aborts. However, if all the connections happened at the same time, it would eat-up the bandwidth of LAN traffics. Maybe the way-out is to change the whole scheme to event-driven, i.e., pbs_server listens mostly, and pbs_mom does the speak.

So far if pbs refuses to update the nodes status and return a fake one because the timeout issue, you can only try to mark the down nodes offline, yes, manually.

I found an interesting article talking about the way to check which node is bad at the moment qstat doesn't work. Also I need to point out is that we still don't have any nice way to remove the toubled sockets at this time.

From: Karsten Petersen
Subject: [TORQUEUsers] pbs_server hangs when a pbs_mom is down.

On Tue, 14 Oct 2003, Don Brace wrote:
> It is sometimes difficult to determine which node is causing the
> problem. Is there an automated way to determine which node is causing
> the problem?

We see this problem with OpenPBS 2.3.15 about once a month.

You should be able to identify the node by looking at the open
sockets of the pbs_server process.

With Linux:
    lsof -p `pgrep pbs_server` | grep IPv4
    
If everything is running well, it looks like this:
    pbs_serve 10832 root 6u IPv4 937764267 TCP *:pbs (LISTEN)
    pbs_serve 10832 root 7u IPv4 937764286 UDP *:15001 
    pbs_serve 10832 root 8u IPv4 937764287 UDP *:1023 

But if pbs_server hangs (no qstat output), you see several connections
to the dead node that are in the ESTABLISHED state:
    [...]
    pbs_serve 1780 [...] TCP clic0a1:1023->clic4l43:pbs_mom (ESTABLISHED)
    pbs_serve 1780 [...] TCP clic0a1:1022->clic4l43:pbs_mom (ESTABLISHED)
    pbs_serve 1780 [...] TCP clic0a1:1021->clic4l43:pbs_mom (ESTABLISHED)
    pbs_serve 1780 [...] TCP clic0a1:1020->clic4l43:pbs_mom (ESTABLISHED)
    [...]

Reducing OpenPBS Log in OSCAR 3

In the default setting of OSCAR 3 package, OpenPBS server generates about several 10s mega byte log everyday. It causes some storage problem if your /var is not big enough. There might be some other problems also caused by huge log. After a dig to the OSCAR mail archive, I found my savior. This log mode saves a lot of disk space and still makes sense.

This technical note may only apply on OSCAR software

From: Jeremy Enos
Subject: Re: [Oscar-users] PBS server log

Looks like the server was set to log everything by default (511).  Here's 
how to change it:

qmgr -c "set server log_events = 127"

Here are the different level descriptions:

1 Error Events
2 Batch System/Server Events
4 Administration Events
8 Job Events
16 Job Resource Usage (hex value 0x10)
32 Security Violations (hex value 0x20)
64 Scheduler Calls (hex value 0x40)
128 Debug Messages (hex value 0x80)
256 Extra Debug Messages (hex value 0x100)
Everything turned on is 511. 127 is a good value to use.

Sunday, May 18, 2003

Demo Serial jobs in PBS

This technical note may only apply on OSCAR software

This shows how you submit non-MPI jobs using OpenPBS under our cluster (OSCAR System).

1. Download this file from my weblog. Save it to nompihello.c or hello.c

2. Run this:

$ gcc hello.c -o hello
3. Modify the following pbs script, save it as, say, hello.pbs. Please modify the variables "N" and change the current path.
#PBS -N "Test"
#PBS -l nodes=1:ppn=1
#PBS -q workq
#PBS -S /bin/sh

# Edit this line to reflect your directory.
cd /home/mjhsieh

./hello >& log-hello
4. Run this:
$ qsub hello.pbs

Demo MPI jobs in the PBS

This technical note may only apply on OSCAR software

This shows how you submit an MPI job using OpenPBS under our cluster (OSCAR System).

1. Download this file from the MPC server. Save it to mpihello.c or hello.c

2. Run this:

$ mpicc hello.c -o hello
3. Modify the following pbs script, save it as, say, hello.pbs. Please modify the variables nodes, N and change the current path.
#!/bin/csh
#PBS -N "Test"
#PBS -l nodes=10:ppn=1
#PBS -q workq
#PBS -S /bin/sh

set NP=`(wc -l < $PBS_NODEFILE) | awk '{print $1}'`

# Edit this line to reflect your directory.
cd /home/mjhsieh

mpirun -machinefile $PBS_NODEFILE -np $NP ./hello >& log-hello
4. Run this:
$ qsub hello.pbs

Thursday, April 24, 2003

Submitting interactive jobs using PBS

This technical note may only apply on OSCAR software

In the document of GradEA cluster, you can submit interactive jobs if you just want to run simple jobs on a not-so-busy machine. Since our cluster is also using PBS to allocate these available computing resource just like GradEA, surely we can do it, too.

[mjhsieh@cluster1 ~]$ qsub -I -l nodes=1:ppn=1
qsub: waiting for job 9787.cluster1 to start
qsub: job 9787.cluster1 ready

[mjhsieh@oscar019 ~]$ find . -name \*.trj -exec bzip2 -9 {} \;
[mjhsieh@oscar019 ~]$ exit
logout

qsub: job 9787.cluster1 completed
See? It seems pretty easy. The most important thing is, you don't need to run the risk of killing the cluster server when you want to do something essential like compressing data.

Now it comes up with a problem. How would you use X-forwarding if you want to run X-interface jobs? I tried a lot of combination, I found that we can not utilize the ssh tunnel for X-forwarding from the nodes when you qsub an interactive job. You still can specify your DISPLAY variable for your GradEA account though we can't in our cluster. The reason is simply that our computing nodes doesn't have access outside the local network.