Scalability Results

Model Description

These scalability tests were run using CitcomS 3.2.0 with default configuration. The mesh for these tests is a regional cap with 129x129x129 nodes. Total velocity unknowns is 129^3 x 3 = 6.4 million. The model is run for 11 time steps. The result reported is the total wall clock time. Each node on this cluster has 2 Xeon 5680 series 3.33GHz hex-core processors with a 12MB unified L3 cache and 24GB RAM, for a total of 12 cores per node. The interconnect is QDR InfiniBand.

PartitionTotal ProcsWall Time (sec)SpeedupScalability
1x1x11472171.0001.000
1x1x22254661.8540.927
1x1x44146453.2240.806
2x2x14144383.2700.818
2x2x2889805.2580.657
2x2x416443210.6540.666
4x4x11653678.7980.550
4x4x232246019.1940.600
4x4x464134635.0790.548
8x8x212858380.9900.633
8x8x4256337140.1100.547

The input file is available here. It is currently configured for 1x1x1 processors, to do different processor divisions you must change the nprocx, nprocy, and nprocz parameters. You must create a folder named “scratch” in the working directory for the output files. The input file uses the non-Python version of CitcomS, located at CitcomS-3.2.0/bin/CitcomSRegional.