Monday, June 15, 2009

Fedora 8 x86_64 server, OSCAR 5.x, ia32 nodes oh my!

Please leave a comment blow if anyone is reading this.

First of all, my server configuration is a Dell PE2950 with 8G memory and 6 network ports. The first time I tried to install OSCAR, I was using CentOS 5.3 x86_64 with OSCAR 6.0.3-1. However the OSCAR 6.0.3-1 is not stable enough for me to install things without errors. No luck for Fedora 9 with OSCAR 6.0.3-1 either, thanks to many perl package obstacles. The first time I fell-back to OSCAR 5.1+, I was trying Fedora 9 under a false impression of compatibility but unfortunately it's not. So here I come Fedora 8 x86_64! My goal is to install a cluster with ia32 (i386) nodes on a x86_64 server utilizing all the 6 network ports.

Here are some notes:

  1. Since I installed the Fedora 8 from the disk+network without any modification (the only option I had was I chose to install developers software option), I only have 2G swap partition space, I need to use a swapfile. # dd if=/dev/zero of=/swapfile0 bs=1M count=8192
  2. # mkswap /swapfile0; chmod 600 /swapfile0; swapon /swapfile0
  3. # echo "/swapfile0 swap swap defaults 0 0" >> /etc/fstab
  4. # chkconfig NetworkManager off (in fc8 after the text-mode installation, the default setting is off)
  5. # chkconfig network on (in fc8 after the text-mode installation, the default setting is on)
  6. # chkconfig iptables off
  7. # chkconfig ip6tables off
  8. Make sure you really turned off the iptables and NetworkManager, the GUI tools might be deceptive.
  9. # perl -pi -e 's/SELINUX=enforcing/SELINUX=disabled/' /etc/selinux/config
  10. # yum -y install bridge-utils gnuplot grace
  11. # brctl addbr br0
  12. # brctl addif br0 eth1; brctl addif br0 eth2; brctl addif br0 eth3; brctl addif br0 eth4; brctl addif br0 eth5
  13. edit /etc/sysconfig/network-scripts/ifcfg-eth[1-5] accordingly with BOOTPROTO=none, ONBOOT=yes, BRIDGE=br0, NM_CONTROLLED=no, IPV6INIT=no, IPV6_AUTOCONF=no
  14. edit /etc/sysconfig/network-scripts/ifcfg-eth0 to set NM_CONTROLLED=no, IPV6INIT=no, IPV6_AUTOCONF=no
  15. create /etc/sysconfig/network-scripts/ifcfg-br0 with contents of
    DEVICE=br0
    TYPE=Bridge
    BOOTPROTO=static
    IPADDR=10.0.0.254
    NETMASK=255.255.255.0
    ONBOOT=yes
    NOZEROCONF=yes
    DELAY=0
    STP=no
    NM_CONTROLLED=no
    IPV6INIT=no
    IPV6_AUTOCONF=no
  16. create /etc/modprobe.d/disableipv6 with one line:
    install ipv6 /bin/true
  17. edit /etc/sysconfig/network to modify the hostname and use the command "hostname" to change current setting, too.
  18. Add oscar_server nfs_oscar pbs_oscar for 10.0.0.254 into the file /etc/hosts
  19. Add all the names for this server into the file /etc/mail/local-host-names
  20. Add
    ALL  : 10.0.0.0/255.255.255.0,localhost,the-external-ip-for-your-server
    sshd : 10.0.0.0/255.255.255.0,.uci.edu
    into the file /etc/hosts.allow . (I am from .uci.edu .)
  21. Add
    ALL    : ALL EXCEPT LOCAL
    into the file /etc/hosts.deny .
  22. If httpd was installed, add
    ServerAdmin yourname@yourmail.box
    ServerSignature Off
    ServerTokens Prod
    into a new file of /etc/httpd/conf.d/lab.conf
  23. # yum remove NetworkManager.i386 (important if you want to do full uptodate with yum.
  24. # yum update (optional, but it should be very helpful.)
  25. # reboot
  26. I have doubts on OSCAR 6, so I chose to use OSCAR 5.x from the nightly branch, downloaded oscar-repo-common-rpms-5*nightly-*.tar.gz, oscar-repo-fc-8-x86_64-5*nightly-*.tar.gz and oscar-repo-fc-8-i386-5*nightly-*.tar.gz
  27. # mkdir -p /tftpboot/distro /tftpboot/oscar; tar xzfC oscar-repo-common-rpms-*.tar.gz /tftpboot/oscar/; tar xzfC oscar-repo-fc-8-x86_64-5*nightly-*.tar.gz /tftpboot/oscar/; tar xzfC oscar-repo-fc-8-i386-5*nightly-*.tar.gz /tftpboot/oscar/
  28. # perl -pi -e 's/gpgcheck=1/gpgcheck=0/' /etc/yum.conf
  29. # yum install createrepo /tftpboot/oscar/common-rpms/yume*.rpm
  30. # yume --repo /tftpboot/oscar/common-rpms install oscar-base
  31. # rsync -avx --delete --bwlimit=128 rsync://archive.fedoraproject.org/fedora-archive/fedora/linux/releases/8/Fedora/x86_64/os/Packages/ /tftpboot/distro/fedora-8-x86_64/
  32. rsync -avx --delete --bwlimit=128 rsync://archive.fedoraproject.org/fedora-archive/fedora/linux/releases/8/Fedora/i386/os/Packages/ /tftpboot/distro/fedora-8-i386/
  33. create /tftpboot/distro/fedora-8-i386.url with one line:
    file:/tftpboot/distro/fedora-8-i386
    since this file won't be generated automatically. (not sure if fedora-8-x86_64.url was generated or bundled with packages or not, check it anyway.)
  34. # cd /opt/oscar/lib
  35. # ../scripts/repo-update --url http://archive.fedoraproject.org/pub/archive/fedora/linux/updates/8/i386.newkey --repo /tftpboot/distro/fedora-8-i386
  36. # ../scripts/repo-update --rmdup --repo /tftpboot/distro/fedora-8-i386
  37. # yume --prepare --repo /tftpboot/distro/fedora-8-i386
  38. # ../scripts/repo-update --url http://archive.fedoraproject.org/pub/archive/fedora/linux/updates/8/x86_64.newkey --repo /tftpboot/distro/fedora-8-x86_64
  39. # ../scripts/repo-update --rmdup --repo /tftpboot/distro/fedora-8-x86_64
  40. # rm /tftpboot/distro/*/*torque* /tftpboot/distro/*/openmpi*rpm
  41. # yume --prepare --repo /tftpboot/distro/fedora-8-x86_64
  42. Make sure that /tftpboot/distro/fedora-8-x86_64.url, /tftpboot/distro/fedora-8-i386.url, /tftpboot/oscar/fc-8-i386.url and /tftpboot/oscar/fc-8-x86_64.url exist and contain correct URL information.
  43. # perl -pi -e 's/^#PermitRootLogin yes/PermitRootLogin yes/' /etc/ssh/sshd_config
  44. # yum install perl-AppConfig
  45. # perl -pi -e 's/\/usr\/sbin\/netbootmgr/\/usr\/bin\/netbootmgr/' /opt/oscar/scripts/oscar_wizard
  46. # cd /opt/oscar; env OSCAR_VERBOSE=3 ./install_cluster br0
  47. uncheck the loghost from the package list because it is not working in fc8.
  48. normally you don't need to worry about the following configuration step, I didn't change anything but ganglia seems to be a good place to do some modification on the default setting.
  49. install server packages
  50. Do remember that everytime you run "./install_cluster" your previous setting via install_cluster will be lost. Redo the setting and installing the package. To avoid that, after "install server packages" step is done, if for some reason you exited the ./install_cluster command, you are free to use oscar_wizard. Here is how you do it: "1. use a new shell window other than your old ./install_cluster shell. 2. cd /opt/oscar/scripts; ./oscar_wizard"
  51. revise /opt/oscar/oscarsamples/scsi.disk for SATA/SCSI nodes configuration.
  52. revise /opt/oscar/oscarsamples/fc-8-i386.rpmlist
  53. build an image for i386 (ia32) nodes
  54. Use another shell window to modify the image: # chroot chroot /var/lib/systemimager/images/i386image chkconfig avahi-daemon off
  55. Define a first node for test. Remember to change the oscarnode to node since the string 'oscarnode' is too long.
  56. Click 'setup networking', start collecting the MAC address and assign IPs, click 'Stop collecting MACs', click 'Configure DHCP Server' then click 'Setup Network Boot', wait for 'okay' popup.
  57. Open a new shell window to modify /tftpboot/kernel and /tftp/initrd.img. These two files are mistakenly linked to x86_64 versions, this is because the image architecture was labelled x86_64, apparently it's not what we wanted.
    # rm /tftpboot/kernel /tftp/initrd.img
    # cd /tftpboot
    # cp -p /usr/share/systemimager/boot/i386/standard/kernel install-kernel-i386
    # cp -p /usr/share/systemimager/boot/i386/standard/initrd.img install-initrd-i386.img
    # ln -s install-kernel-i386 kernel
    # ln -s install-initrd-i386.img initrd.img
  58. Disable floppy, Hyperthreading, keyboard missing warning... etc.
  59. Make sure the node set to PXE boot as the first priority in the boot list.
  60. Boot and install (I presume it went smoothly.)
  61. After the reboot, make sure the network interface is correct. Login and ping 10.0.0.254 to see if the network is working. I have a failure percentage of more than 10% failure that eth0 and eth1 are wrongly ordered and I could not alias them to the correct order, damn old hardware.
  62. In my case, I fired up netbootmgr and set the network failure node to "install"
  63. reboot the failure nodes and check the BIOS setting, make sure the RAM setting is at "default", all my failure nodes have wrong RAM specification settings.
  64. Another problem that the node didn't install is that the network is too busy, in this case, just do a reboot will do.
  65. After first node got installed successfully, click the 'Complete Cluster Setup' button.
  66. Don't forget to click the test cluster button
  67. Now install 80 more nodes.
  68. post installation modification:
    perl -pi -e 's/tmpwatch -x \/tmp/tmpwatch -m -x \/tmp/;s/10d/100d/' /var/lib/systemimager/images/i386image/etc/cron.daily/tmpwatch
  69. post installation modification: # cpush /var/lib/systemimager/images/i386image/etc/cron.daily/tmpwatch /etc/cron.daily/tmpwatch
  70. post installation modification: append this in the end of file /var/lib/systemimager/images/i386image/etc/sysctl.conf :
    # for g03 (added by mengjuei hsieh)
    kernel.randomize_va_space = 0
  71. post installation modification: # env -u DISPLAY cexec sysctl -w kernel.randomize_va_space=0
  72. post installation modification: append this in the end of file /var/lib/systemimager/images/i386image/etc/sysctl.conf :
    # suggestion from ibm redbook (added by mjhsieh)
    vm.overcommit_memory = 1
  73. post installation modification: # env -u DISPLAY cexec 'sysctl -w vm.overcommit_memory=1'
  74. post installation modification: # cpush /var/lib/systemimager/images/i386image/etc/sysctl.conf /etc/sysctl.conf
  75. post installation modification: # env -u DISPLAY cexec :1-81 cp -pr /home/software/compilers-i386 /opt/compilers
  76. post installation modification, this is for compilers:
    # yume --installroot /var/lib/systemimager/images/i386image install compat-libstdc++-33
    # cexec yume -y install compat-libstdc++-33
    # echo "compat-libstdc++-33" >> /opt/oscar/oscarsamples/fc-8-i386.rpmlist
  77. For some reason I found that pbs_mom might also be running on the cluster/headnode even if I disabled it at the beginning, remember to check it out.
  78. For some reason torque-docs was not installed, fixing this by doing # yume install torque-docs, make sure you didn't install this with FC8's torque-docs.
  79. I have 79 pentium 4 nodes and 2 nodes with old xeon (ia32), so I decided to create a PBS queue for pentium 4, so that I can allocate pentium 4 nodes: use the command "# qmgr < P4.setting", where P4.setting file contains:
    #
    # Create queues and set their attributes.
    #
    #
    # Create and define queue workq
    #
    create queue p4
    set queue p4 queue_type = Execution
    set queue p4 resources_max.cput = 10000:00:00
    set queue p4 resources_max.ncpus = 4
    set queue p4 resources_max.nodect = 2
    set queue p4 resources_max.walltime = 10000:00:00
    set queue p4 resources_min.cput = 00:00:01
    set queue p4 resources_min.ncpus = 1
    set queue p4 resources_min.nodect = 1
    set queue p4 resources_min.walltime = 00:00:01
    set queue p4 resources_default.cput = 10000:00:00
    set queue p4 resources_default.ncpus = 1
    set queue p4 resources_default.nodect = 1
    set queue p4 resources_default.walltime = 10000:00:00
    set queue p4 resources_available.nodect = 2
    set queue p4 enabled = True
    set queue p4 started = True
    #
    # Assign nodes to queue p4
    #
    set node node1.local,node2.local,node3.local,node4.local,node5.local,node6.local,node7.local,node8.local,node9.local,node10.local,node11.local,node12.local,node13.local,node14.local,node15.local,node16.local,node17.local,node18.local,node19.local,node20.local,node21.local,node22.local,node23.local,node24.local,node25.local,node26.local,node27.local,node28.local,node29.local,node30.local,node31.local,node32.local,node33.local,node34.local,node35.local,node36.local,node37.local,node38.local,node39.local,node40.local,node41.local,node42.local,node43.local,node44.local,node45.local,node46.local,node47.local,node48.local,node49.local,node50.local,node51.local,node52.local,node53.local,node54.local,node55.local,node56.local,node57.local,node58.local,node59.local,node60.local,node61.local,node62.local,node63.local,node64.local,node65.local,node66.local,node67.local,node68.local,node69.local,node70.local,node71.local,node72.local,node73.local,node74.local,node75.local,node76.local,node77.local,node78.local,node79.local properties+=p4
  80. # curl http://mjhsieh.googlecode.com/svn/trunk/OpenPBS/free-nodes -o /usr/local/bin/free-nodes; chmod +x /usr/local/bin/free-nodes
  81. # curl http://mjhsieh.googlecode.com/svn/trunk/OpenPBS/qterm -o /usr/local/bin/qterm; chmod +x /usr/local/bin/qterm

Here are the final result:

.------------------------------------------------------------.
|     * OpenPBS NODES REPORT (v0.052) * (by mjhsieh)         |
`------------------------------------------------------------'
      Queue         Free       CPU       Nodes     Nodes Down 
      Name          Nodes      in use    Defined   or Offline 
      ----------    -----      -----     -----     -----      
      p4            79         0         79        0
      --------------------------------------------------
      Summary:      81         0         81        0
                    There are 0 job(s) queued.

Further commands

  1. Example to update node image: # yume --installroot /var/lib/systemimager/images/i386image update
  2. No, previous command cannot be replaced by # chroot /var/lib/systemimager/images/i386image plus some other
  3. Example to install/update nodes package: # cexec yume install -y vim-enhanced

1 comment:

Mengjuei Hsieh said...

it's very hard to believe that I need yet another installation 3 months later.