Table of Contents

Back to performance benchmark lists

Compile options

F90OPTFLAGS = -O3 -xhost

Definition of columns

name
# of Cores Number of used CPU cores
# of Processes Number of MPI processes
# of Threads Number of threads for each process
$l_{max}$ Truncation lavel for spherical harmonincs
$(N_{r},N_{\theta},N_{\phi})$ Nuber of grids in spherical coordinate
Elapsed Elapsed (wall clock time) for one time step
Nonlinear Elapsed (wall clock time) for nonlinear terms (including communications)
Solver Elapsed (wall clock time) for linear calculation
Comm. Elapsed (wall clock time) for data communication
Efficiency Parallel efficiency
SUs Service unit for $10^{4}$ time steps (Core hours)

Single Processor Result

$l_{max}$ $(N_{r},N_{\theta},N_{\phi})$ Elapsed Nonlinear Solver Comm. SU
47 ( 73,72,144) 1.604797 1.56274 0.042059 0.508469 71.3243

Strong Scaling Results

$l_{max}$ $(N_{r},N_{\theta},N_{\phi})$
47 (48,72,144)
# of Cores # of Processes # of SMP Elapsed Nonlinear Solver Comm. Efficiency SUs
1 1 1 1.04900 0.880913 0.168083 0.360225 1.0 2.91389
2 2 1 0.538092 0.453756 0.0843343 0.194296 0.97474 2.9894
4 4 1 0.274424 0.23125 0.0431727 0.0996035 0.955636 3.04916
8 8 1 0.145301 0.122894 0.0224057 0.0558862 0.902437 3.22891
16 16 1 0.095041 0.0821946 0.0128446 0.0449899 0.689833 4.22404
$l_{max}$ $(N_{r},N_{\theta},N_{\phi})$
127 (256,192,384)
# of Cores # of Processes # of SMP Elapsed Nonlinear Solver Comm. Efficiency SUs
4 4 1 17.9454 16.918 1.02739 6.01468 1.45157 1749.79
8 8 1 9.98835 9.45138 0.536971 3.40571 1.30397 1749.79
16 16 1 6.51225 6.2059 0.00297473 2.40569 1.0 1749.79
32 32 1 3.30412 3.15372 0.00297473 1.23981 0.985473 1749.79
64 64 1 1.71612 1.64141 0.00297473 0.6747 0.948685 1749.79
128 128 1 0.91125 0.870656 0.00297473 0.383074 0.893313 1749.79
$l_{max}$ $(N_{r},N_{\theta},N_{\phi})$
255 (513,384,768)
# of Cores # of Processes # of SMP Elapsed Nonlinear Solver Comm. Efficiency SUs
128 128 1 10.4146 10.0946 0.320031 4.01002 1.0 3702.97
256 256 1 5.4918 5.33942 0.152376 2.2764 0.948195 3905.28


Elapsed (wall clock) time for the strong scaling. Ideal scaling is plotted by dotted line.


Parallel Efficiency for the strong scaling.

Weak Scaling Results

# of Cores # of Processes # of SMP $l_{max}$ $(N_{r},N_{\theta},N_{\phi})$ Elapsed Nonlinear Solver Comm. SUs
4 4 1 31 (256,48,96) 0.367794 0.302528 0.0652638 0.13866 4.0866
16 4 1 63 (256,96,192) 0.829839 0.757857 0.0719786 0.333987 36.8817
64 16 1 127 (256,192,384) 1.71612 1.64141 0.0747133 0.674700 305.089
256 64 1 255 (256,384,768) 2.74791 2.67373 0.0741758 1.13636 1954.07


Elapsed time for the weak scaling in the horizontal resolution. An ideal scaling for Legendre transform is plotted by dotted line.

# of Cores # of Processes # of SMP $l_{max}$ $(N_{r},N_{\theta},N_{\phi})$ Elapsed Nonlinear Solver Comm. SUs
32 32 1 255 (64,384,768) 5.15760 4.98965 0.167950 1.90546 458.454
64 64 1 255 (128,384,768) 5.15654 4.99703 0.159511 1.92557 916.718
128 128 1 255 (256,384,768) 5.30425 5.14686 0.157383 2.07861 1885.96
256 256 1 255 (512,384,768) 5.49180 5.33942 0.152376 2.2764 3905.28


Elapsed time for the weak scaling in the radial resolution. An ideal scaling is a constant elapsed time.

Back to performance benchmark lists

files