Table of Contents

Back to performance benchmark lists

compile options

F90OPTFLAGS = -r8 -i4 -ftz -IPF_fma -IPF_fltacc -WB -O3 -xhosts

Definition of columns

name
# of Cores Number of used CPU cores
# of Processes Number of MPI processes
# of Threads Number of threads for each process
$N_{c}$ Truncation lavel for Chebyshev polynomials
$l_{max}$ Truncation lavel for spherical harmonincs
$(N_{r},N_{\theta},N_{\phi})$ Nuber of grids in spherical coordinate
Elapsed Elapsed (wall clock time) for one time step
Legendre Elapsed (wall clock time) for Legendre transform
Implicit Elapsed (wall clock time) for linear calculation
Efficiency Parallel efficiency
SUs Service unit for $10^{4}$ time steps (Core hours)

Single Processor Result

$N_{c}$ $l_{max}$ $(N_{r},N_{\theta},N_{\phi})$ Elapsed Legendre Implicit LUdecomp SUs
71 47 (73,72,144) 0.96659 0.57970 0.13313 0.010014 2.6849

Strong Scaling Results

$N_{c}$ $l_{max}$ $(N_{r},N_{\theta},N_{\phi})$
191 255 (193,384,768)
# of Cores # of Processes # of SMP Elapsed Legendre Implicit Efficiency SUs
16 16 1 7.8559 3.1132 1.0993 1.0 349.151
32 32 1 4.4581 1.5484 0.67073 0.881082 396.276
64 64 1 3.4032 0.77098 0.68621 0.577097 605.013
128 128 1 1.0696 0.37643 0.14921 0.918089 380.302
$N_{c}$ $l_{max}$ $(N_{r},N_{\theta},N_{\phi})$
255 511 (257,768,1536)
# of Cores # of Processes # of SMP Elapsed Legendre Implicit Efficiency SUs
64 64 1 13.018 4.7327 1.9132 1.0 414.015
128 128 1 8.7973 2.3534 1.7398 0.555322 745.541
256 256 1 8.678 1.1378 4.3325 0.412058 1004.75


Elapsed (wall clock) time for the strong scaling. Number of OpenMP threads are shown by the numbers. Ideal scaling is plotted by dotted line.


Parallel Efficiency for the strong scaling.

Weak Scaling Results

# of Cores # of Processes # of SMP $N_{c}$ $l_{max}$ $(N_{r},N_{\theta},N_{\phi})$ Elapsed Legendre Implicit SUs
2 2 1 255 63 (257,96,192) 3.3257 1.8103 0.47442 147.809
8 8 1 255 127 (257,192,384) 4.0801 1.8754 0.51211 181.338
32 32 1 255 255 (257,384,768) 5.8172 2.0489 0.8497 517.084
128 128 1 255 511 (257,768,1536) 9.5023 2.3534 1.7398 3378.6
3 3 1 255 63 (257,96,192) 2.3095 1.1811 0.3029 102.644
9 9 1 255 127 (257,192,384) 3.5408 1.7574 0.46589 157.369
33 33 1 255 255 (257,384,768) 5.528 1.7904 0.77925 737.067
129 129 1 255 511 (257,768,1536) 8.9802 1.9042 1.7902 3592.1


Elapsed time for the weak scaling in the horizontal resolution. Scaling of $O(Ncore^{1/2})$ is plotted by dotted line.

# of Cores # of Processes # of SMP $N_{c}$ $l_{max}$ $(N_{r},N_{\theta},N_{\phi})$ Elapsed Legendre Implicit SUs
16 16 1 31 511 (33,768,1536) 5.1154 2.5566 0.49331 227.351
32 32 1 63 511 (65,768,1536) 6.0620 2.4221 0.76595 538.844
64 64 1 127 511 (129,768,1536) 6.9065 2.3967 1.1075 1227.82
128 128 1 255 511 (257,768,1536) 9.5023 2.3534 1.7398 3378.6
17 17 1 31 511 (33,768,1536) 4.2943 2.4266 0.40772 381.716
33 33 1 63 511 (65,768,1536) 5.6936 2.2617 0.67252 759.147
65 65 1 127 511 (129,768,1536) 6.7351 2.1191 1.0571 1496.69
129 129 1 255 511 (257,768,1536) 8.9802 1.9042 1.7902 3592.08


Elapsed time for the weak scaling in the radial resolution. Scaling of $O(Ncore)$ is plotted by dotted line.

Back to performance benchmark lists
files