Sunday, July 23, 2006

OSCAR 4 with direct links

Direct Links

For computing nodes that have more than one GbE port, it might be a good idea to do direct connection between two nodes and taking advantage of the network speed for parallel computation. What I did was to set up IP addresses for the direct link ports, for instance, 192.168.0.1 for odd numbered nodes, 192.168.0.2 for even numbered nodes.

PBS/Torque

In order to access the specified resource through OpenPBS/Torque, we need to create a customized queue for the paired nodes since it's a peer to peer direct link. What I did was to use qmgr and import these commands:

# I copied the resources settings from the workq of OSCAR 4, 
# it might be different from the default workq of OSCAR 5
create queue subpair01
set queue subpair01 queue_type = Execution
set queue subpair01 resources_max.cput = 10000:00:00
set queue subpair01 resources_max.ncpus = 8
set queue subpair01 resources_max.nodect = 2
set queue subpair01 resources_max.walltime = 10000:00:00
set queue subpair01 resources_min.cput = 00:00:01
set queue subpair01 resources_min.ncpus = 1
set queue subpair01 resources_min.nodect = 1
set queue subpair01 resources_min.walltime = 00:00:01
set queue subpair01 resources_default.cput = 10000:00:00
set queue subpair01 resources_default.ncpus = 1
set queue subpair01 resources_default.nodect = 1
set queue subpair01 resources_default.walltime = 10000:00:00
set queue subpair01 resources_available.nodect = 2
set queue subpair01 enabled = True
set queue subpair01 started = True
set node node1.local,node2.local properties+=subpair01
Actually, you can save the commands into a file and use
gmgr < ./commands
By the way, before going into the parallel computation on direct link, make sure your ssh won't croak the signature stuffs. Make sure to use c3 to have it done (Like: cexec :1-2 ssh 192.168.0.1 uptime; cexec :1-2 ssh 192.168.0.2 uptime). (Actually, there are a lot of potential problem about ssh, I believe the signature problems are simplified by OSCAR installation.)

MPICH

Here I use a PBS script to submit my MPICH jobs, this example is for AMBER jac benchmark. Please read the script and see how I specify MPI_HOST to tell MPICH the routing of message traffic.

#!/bin/sh
#PBS -N "MPICHjob"
#PBS -q subpair01
#PBS -l nodes=2:subpair01:ppn=8
#PBS -S /bin/sh
#PBS -r n
cd /home/demo/MPICH_SUBPAIR
# customized machinefile
cat > machine.subpairN << EOF
192.168.0.1:4
192.168.0.2:4
EOF
# Tell mpich to run through the direct link
export MPI_HOST=`/sbin/ifconfig eth1 | grep "inet addr:" \ 
                | sed -e 's/inet addr://' | awk '{print $1}'`
# Recommended by Dave Case in the Amber mail list
export P4_SOCKBUFSIZE=524288

# Run
source /opt/intel/fc/9.0/bin/ifortvars.sh
/home/software/mpich_net/bin/mpirun -machinefile ./machine.subpairN -np 8 \ 
        /home/software/amber9/exe/pmemd.MPICH_NET -O -i mdin.amber9 -c \ 
        inpcrd.equil -p prmtop -o /tmp/output.txt -x /dev/null -r \ 
        /dev/null
# Data Retreival
mv /tmp/output.txt output.pmemd9.MPICH_SUBPAIR

LAM/MPI

Then these are the script for LAM/MPI, you can see I still need to specify the routing of traffic. Also the first node defined by lamboot may not be the same node that PBS send you to.

#!/bin/sh
#PBS -N "LAMMPIjob"
#PBS -q subpair01
#PBS -l nodes=2:subpair01:ppn=8
#PBS -S /bin/sh
#PBS -r n
cd /home/demo/LAM
# customized machinefile
cat > machine.subpairN << EOF
192.168.0.1 cpu=4
192.168.0.2 cpu=4
EOF

# if we don't specify -ssi boot rsh, lam will use boot tm and
# the IPs provided by pbs that uses the oscar lan.
/opt/lam/bin/lamboot -ssi boot rsh -ssi rsh_agent "ssh" -v machine.subpairN

# Run
source /opt/intel/fc/9.0/bin/ifortvars.sh

/opt/lam/bin/mpirun -ssi rpi sysv -np 8 \ 
        ./sander9.LAM -O -i mdin -c inpcrd.equil -p prmtop \ 
        -o /tmp/output.txt -x /tmp/trajectory.crd -r /tmp/restart.rst
/opt/lam/bin/lamhalt >& /dev/null
# Data Retreival
# becuase the master node is n0, not the first node of pbs
ssh 192.168.0.1 mv /tmp/output.txt /tmp/trajectory.crd /tmp/restart.rst /home/demo/LAM

No comments: