Back to performance benchmark lists
F90OPTFLAGS = -O3 -xhost
name | |
---|---|
# of Cores | Number of used CPU cores |
# of Processes | Number of MPI processes |
# of Threads | Number of threads for each process |
$l_{max}$ | Truncation lavel for spherical harmonincs |
$(N_{r},N_{\theta},N_{\phi})$ | Nuber of grids in spherical coordinate |
Elapsed | Elapsed (wall clock time) for one time step |
Nonlinear | Elapsed (wall clock time) for nonlinear terms (including communications) |
Solver | Elapsed (wall clock time) for linear calculation |
Comm. | Elapsed (wall clock time) for data communication |
Efficiency | Parallel efficiency |
SUs | Service unit for $10^{4}$ time steps (Core hours) |
$l_{max}$ | $(N_{r},N_{\theta},N_{\phi})$ | Elapsed | Nonlinear | Solver | Comm. | SU |
---|---|---|---|---|---|---|
47 | ( 73,72,144) | 1.604797 | 1.56274 | 0.042059 | 0.508469 | 71.3243 |
$l_{max}$ | $(N_{r},N_{\theta},N_{\phi})$ |
---|---|
47 | (48,72,144) |
# of Cores | # of Processes | # of SMP | Elapsed | Nonlinear | Solver | Comm. | Efficiency | SUs |
---|---|---|---|---|---|---|---|---|
1 | 1 | 1 | 1.04900 | 0.880913 | 0.168083 | 0.360225 | 1.0 | 2.91389 |
2 | 2 | 1 | 0.538092 | 0.453756 | 0.0843343 | 0.194296 | 0.97474 | 2.9894 |
4 | 4 | 1 | 0.274424 | 0.23125 | 0.0431727 | 0.0996035 | 0.955636 | 3.04916 |
8 | 8 | 1 | 0.145301 | 0.122894 | 0.0224057 | 0.0558862 | 0.902437 | 3.22891 |
16 | 16 | 1 | 0.095041 | 0.0821946 | 0.0128446 | 0.0449899 | 0.689833 | 4.22404 |
$l_{max}$ | $(N_{r},N_{\theta},N_{\phi})$ |
---|---|
127 | (256,192,384) |
# of Cores | # of Processes | # of SMP | Elapsed | Nonlinear | Solver | Comm. | Efficiency | SUs |
---|---|---|---|---|---|---|---|---|
4 | 4 | 1 | 17.9454 | 16.918 | 1.02739 | 6.01468 | 1.45157 | 1749.79 |
8 | 8 | 1 | 9.98835 | 9.45138 | 0.536971 | 3.40571 | 1.30397 | 1749.79 |
16 | 16 | 1 | 6.51225 | 6.2059 | 0.00297473 | 2.40569 | 1.0 | 1749.79 |
32 | 32 | 1 | 3.30412 | 3.15372 | 0.00297473 | 1.23981 | 0.985473 | 1749.79 |
64 | 64 | 1 | 1.71612 | 1.64141 | 0.00297473 | 0.6747 | 0.948685 | 1749.79 |
128 | 128 | 1 | 0.91125 | 0.870656 | 0.00297473 | 0.383074 | 0.893313 | 1749.79 |
$l_{max}$ | $(N_{r},N_{\theta},N_{\phi})$ |
---|---|
255 | (513,384,768) |
# of Cores | # of Processes | # of SMP | Elapsed | Nonlinear | Solver | Comm. | Efficiency | SUs |
---|---|---|---|---|---|---|---|---|
128 | 128 | 1 | 10.4146 | 10.0946 | 0.320031 | 4.01002 | 1.0 | 3702.97 |
256 | 256 | 1 | 5.4918 | 5.33942 | 0.152376 | 2.2764 | 0.948195 | 3905.28 |
Elapsed (wall clock) time for the strong scaling. Ideal scaling is plotted by dotted line.
# of Cores | # of Processes | # of SMP | $l_{max}$ | $(N_{r},N_{\theta},N_{\phi})$ | Elapsed | Nonlinear | Solver | Comm. | SUs |
---|---|---|---|---|---|---|---|---|---|
4 | 4 | 1 | 31 | (256,48,96) | 0.367794 | 0.302528 | 0.0652638 | 0.13866 | 4.0866 |
16 | 4 | 1 | 63 | (256,96,192) | 0.829839 | 0.757857 | 0.0719786 | 0.333987 | 36.8817 |
64 | 16 | 1 | 127 | (256,192,384) | 1.71612 | 1.64141 | 0.0747133 | 0.674700 | 305.089 |
256 | 64 | 1 | 255 | (256,384,768) | 2.74791 | 2.67373 | 0.0741758 | 1.13636 | 1954.07 |
Elapsed time for the weak scaling in the horizontal resolution. An ideal scaling for Legendre transform is plotted by dotted line.
# of Cores | # of Processes | # of SMP | $l_{max}$ | $(N_{r},N_{\theta},N_{\phi})$ | Elapsed | Nonlinear | Solver | Comm. | SUs |
---|---|---|---|---|---|---|---|---|---|
32 | 32 | 1 | 255 | (64,384,768) | 5.15760 | 4.98965 | 0.167950 | 1.90546 | 458.454 |
64 | 64 | 1 | 255 | (128,384,768) | 5.15654 | 4.99703 | 0.159511 | 1.92557 | 916.718 |
128 | 128 | 1 | 255 | (256,384,768) | 5.30425 | 5.14686 | 0.157383 | 2.07861 | 1885.96 |
256 | 256 | 1 | 255 | (512,384,768) | 5.49180 | 5.33942 | 0.152376 | 2.2764 | 3905.28 |
Elapsed time for the weak scaling in the radial resolution. An ideal scaling is a constant elapsed time.