Table of Contents

Back to performance benchmark lists

compile options

F90OPTFLAGS = -r8 -i4 -ftz -IPF_fma -IPF_fltacc -WB -O2

Definition of columns

name
# of Cores Number of used CPU cores
# of Processes Number of MPI processes
# of Threads Number of threads for each process
$N_{c}$ Truncation lavel for Chebyshev polynomials
$l_{max}$ Truncation lavel for spherical harmonincs
$(N_{r},N_{\theta},N_{\phi})$ Nuber of grids in spherical coordinate
Elapsed Elapsed (wall clock time) for one time step
Nonlinear Elapsed (wall clock time) for nonlinear terms (including communications)
Solver Elapsed (wall clock time) for linear calculation
Comm. Elapsed (wall clock time) for data communication
Efficiency Parallel efficiency
SUs Service unit for $10^{4}$ time steps (Core hours)

Two Processor Result

$N_{c}$ $l_{max}$ $(N_{r},N_{\theta},N_{\phi})$ Elapsed Nonlinear Solver Comm. SU

Strong Scaling Results

$N_{c}$ $l_{max}$ $(N_{r},N_{\theta},N_{\phi})$
192 170 (193,256,256)
# of Cores # of Processes # of SMP Elapsed Nonlinear Solver Comm. Efficiency SUs
32 32 1 0.719657 0.518729 0.200927 0.182378 1 63.9695
64 64 1 0.545835 0.336805 0.209030 0.21079 0.659226 97.0373
128 128 1 0.285552 0.19960 0.085952 0.174127 0.630058 101.529
$N_{c}$ $l_{max}$ $(N_{r},N_{\theta},N_{\phi})$
256 341 (257,512,512)
# of Cores # of Processes # of SMP Elapsed Nonlinear Solver Comm. Efficiency SUs
86 86 1 2.47102 1.88142 0.589604 0.379399 1 658.939
128 128 1 2.32883 1.91045 0.418383 0.857392 0.712896 828.03
129 129 1 2.09683 1.68765 0.409184 0.663364 0.785635 838.734
256 256 1 1.41293 1.13385 0.279076 0.675334 0.58751 1004.75
257 257 1 1.33908 1.06534 0.273739 0.621596 0.61991 1011.75


Elapsed (wall clock) time for the strong scaling. Number of OpenMP threads are shown by the numbers. Ideal scaling is plotted by dotted line.


Parallel Efficiency for the strong scaling.

Weak Scaling Results

# of Cores # of Processes # of SMP $N_{c}$ $l_{max}$ $(N_{r},N_{\theta},N_{\phi})$ Elapsed Nonlinear Solver Comm. SUs
4 4 1 256 42 (257,64,64) 0.322194 0.165516 0.156679 0.0525075 14.3197
16 16 1 256 85 (257,128,128) 0.406717 0.204322 0.202394 0.0716188 18.0763
64 64 1 256 170 (257,256,256) 0.743692 0.44779 0.295902 0.312471 132.212
256 256 1 256 341 (257,512,512) 1.41293 1.13385 0.279076 0.675334 1004.75
5 5 1 256 42 (257,64,64) 0.252394 0.127396 0.124998 0.0395718 11.2175
17 17 1 256 85 (257,128,128) 0.359973 0.180849 0.179124 0.0811399 31.9976
65 65 1 256 170 (257,256,256) 0.639472 0.363557 0.275915 0.227634 142.105
257 257 1 256 341 (257,512,512) 1.33908 1.06534 0.273739 0.621596 1011.75


Elapsed time for the weak scaling in the horizontal directions. Scaling of $O(Ncore^{1/2})$ (ideal scaling for Legendre transform) is plotted by dotted line.

# of Cores # of Processes # of SMP $N_{c}$ $l_{max}$ $(N_{r},N_{\theta},N_{\phi})$ Elapsed Nonlinear Solver Comm. SUs
8 8 1 16 341 (17,512,512) 1.28171 1.20689 0.0748194 0.372245 56.9651
16 16 1 32 341 (33,512,512) 1.52159 1.42569 0.0958951 0.315524 67.6261
32 32 1 64 341 (65,512,512) 1.61663 1.45842 0.158208 0.389299 143.7
64 64 1 128 341 (129,512,512) 1.75436 1.50559 0.248772 0.453414 311.887
128 128 1 256 341 (257,512,512) 2.32883 1.91045 0.418383 0.857392 828.03
9 9 1 16 341 (17,512,512) 1.05288 0.993848 0.0590345 0.207514 46.7948
17 17 1 32 341 (33,512,512) 1.03108 0.94969 0.0813889 0.194925 91.6515
33 33 1 64 341 (65,512,512) 1.25166 1.10257 0.149091 0.218563 166.888
65 65 1 128 341 (129,512,512) 1.43519 1.19399 0.241198 0.262446 318.931
129 129 1 256 341 (257,512,512) 2.09683 1.68765 0.409184 0.663364 838.734


Elapsed time for the weak scaling in the radial direction. Scaling of $O(Ncore^{1/2})$ (ideal scaling for Legendre transform) is plotted by dotted line.

Back to performance benchmark lists
files