Table of Contents

Back to performance benchmark lists

compile options

F90OPTFLAGS = -O3 -warn all -g -xhost -openmp

Definition of columns

name
# of Cores Number of used CPU cores
# of Processes Number of MPI processes
# of Threads Number of threads for each process
$l_{max}$ Truncation lavel for spherical harmonincs
$(N_{r},N_{\theta},N_{\phi})$ Nuber of grids in spherical coordinate
Elapsed Elapsed (wall clock time) for one time step
Nonlinear Elapsed (wall clock time) for nonlinear terms (including communications)
Solver Elapsed (wall clock time) for linear calculation
Comm. Elapsed (wall clock time) for data communication
Efficiency Parallel efficiency
SUs Service unit for $10^{4}$ time steps (Core hours)

Single Processor Result

$l_{max}$ $(N_{r},N_{\theta},N_{\phi})$ Elapsed Nonlinear Solver Comm. SU
47 ( 73,72,144) 1.604797 1.56274 0.042059 0.508469 71.3243

Strong Scaling Results

$l_{max}$ $(N_{r},N_{\theta},N_{\phi})$
255 (513,384,768)
# of Cores # of Processes # of SMP Elapsed Nonlinear Solver Comm. Efficiency SUs
256 32 8 7.25365 7.20543 0.0482177 1.09003 0.940665 5158.15
256 64 4 6.82326 6.7743 0.0489558 0.794641 1 4852.09
256 128 2 6.74711 6.69947 0.0476359 0.72289 1.01129 4797.94
512 64 8 3.57915 3.5559 0.0232448 0.541038 0.953195 5090.34
512 128 4 2.13608 2.11468 0.0214005 0.481005 1.59714 3037.98
512 256 2 2.11996 2.09772 0.0222343 0.430132 1.60929 3015.06
1024 128 8 1.76799 1.75701 0.010974 0.300582 0.964834 5028.94
1024 256 4 1.33388 1.32335 0.01052 0.514778 1.27884 3794.14
1024 512 2 2.31397 2.30589 0.00807709 1.36341 0.73718 6581.96
2048 256 8 0.838836 0.833227 0.00560799 0.195589 1.01677 4772.05
2048 512 4 0.81061 0.805883 0.0047257 0.168418 1.05218 4611.47
2048 1024 2 0.672909 0.668731 0.00417642 0.162079 1.26749 3828.1
4096 512 8 0.386554 0.383808 0.00274551 0.101968 1.10322 4398.13
4096 1024 4 0.3577 0.355273 0.00242621 0.105497 1.19221 4069.83


Elapsed (wall clock) time for the strong scaling. Number of OpenMP threads are shown by the numbers. Ideal scaling is plotted by dotted line.


Parallel Efficiency for the strong scaling. Number of OpenMP threads are shown by the numbers.

Weak Scaling Results

# of Cores # of Processes # of SMP $l_{max}$ $(N_{r},N_{\theta},N_{\phi})$ Elapsed Nonlinear Solver Comm. SUs
16 4 4 31 (513,48,96) 0.500935 0.489675 0.0112569 0.129805 22.2638
64 16 4 63 (513,96,192) 0.60037 0.591241 0.00912827 0.21034 106.733
256 64 4 127 (513,192,384) 0.730134 0.72047 0.00966272 0.203059 519.207
1024 256 4 255 (513,384,768) 1.33388 1.32335 0.01052 0.514778 3794.14
4096 1024 4 511 (513,768,1536) 1.94168 1.93247 0.00921192 0.57891 22092


Elapsed time for the weak scaling in the horizontal resolution. The results with 4 OpenMP threads are shown.

# of Cores # of Processes # of SMP $l_{max}$ $(N_{r},N_{\theta},N_{\phi})$ Elapsed Nonlinear Solver Comm. SUs
128 32 4 255 (33,384,768) 0.536611 0.53181 0.00479917 0.718157 190.795
256 64 4 255 (65,384,768) 0.69748 0.69268 0.00479858 0.171752 495.986
512 128 4 255 (129,384,768) 0.694585 0.0689725 0.00444602 0.720039 987.854
1024 256 4 255 (257,384,768) 0.809201 0.804243 0.0049558 0.203059 2301.73
2048 512 4 255 (513,384,768) 0.81061 0.805883 0.0047257 0.168418 4611.47
4096 1024 4 255 (1025,384,768) 0.809201 0.804471 0.00472797 0.17441 9206.91


Elapsed time for the weak scaling in the radial resolution. The results with 4 OpenMP threads are shown.

Back to performance benchmark lists

files