The configuration in this benchmark is shown here:
- OSCAR 4.2 (pre-beta version) with Fedora Core 3 Linux
- Intel Xeon 2.8G with 1MB cache
- Direct GbE connection between two machines
- LAMMPI/MPICH version of PMEMD from AMBER 8 distribution
- According to the document provided by Dr. Duke, P4_SOCKBUFSIZE is set to 524288 for MPICH (and /etc/sysctl.conf on the nodes has to be changed accordingly.).
Figure 1. The JAC benchmark on different node/processor combinations. Denoted "spreading nodes first" means to distribute the threads to as many node as possible. In this plot we can see the performance of 8 thread (4 threads on each node) without hyper-threading is actually much worse than 4 threads (2 threads on each node). It also implies that hyper-threading is good at stressing test.
Figure 2. This plot shows that if we populate one node first then populate the other node, we can see linear scaling (when ignoring 2-threads calculation).
The scaling of JAC calculation seems to be fine probably because the footprint of PMEMD and the simulation system is small. Perhaps a benchmark on bigger system (JAC simulation is on DHFR protein with 159 amino acid residues, explicit water representation.) is needed. I also tried Jumbo Frames (GbE that MTU > 1500) setting, it does slow down the LAM calculation as other report expected.
Conclusion:
- Hyper-threading does help.
- Scaling on Hyper-threading machine can be linear depends on how you look at it.
References:
- Dr. Bob Duke, Using Intel compilers (ifc8) with PMEMD
- Joint Amber/Charmm DHFR benchmark, the information can be found at AMBER benchmark website.
- Gelb Research Group at Washington University in St. Louis, "Fjord" - a linux cluster
No comments:
Post a Comment