Table of Contents

Back to performance benchmark lists

compile options

F90OPTFLAGS = -O3 -xhost

Notes

More than two processes are required
Time stepping adjustment routine are implemented

Definition of columns

name
# of Cores Number of used CPU cores
# of Processes Number of MPI processes
# of Threads Number of threads for each process
$l_{max}$ Truncation lavel for spherical harmonincs
$(N_{r},N_{\theta},N_{\phi})$ Nuber of grids in spherical coordinate
Elapsed Elapsed (wall clock time) for one time step
Nonlinear Elapsed (wall clock time) for nonlinear terms (including communications)
Solver Elapsed (wall clock time) for linear calculation
CFL Elapsed (wall clock time) for CFL condition check
Efficiency Parallel efficiency
SUs Service unit for $10^{4}$ time steps (Core hours)

Two Processes Result

$l_{max}$ $(N_{r},N_{\theta},N_{\phi})$ Elapsed Nonlinear Solver SUs
47 ( 72,72,144) 1.32979 0.510249 0.819544 46.7169

Strong Scaling Results

$l_{max}$ $(N_{r},N_{\theta},N_{\phi})$
47 (72,72,144)
# of Cores # of Processes # of SMP Elapsed Nonlinear Solver Efficiency SUs
2 1 1 1.32979 0.510249 0.819544 1 46.7169
4 3 1 0.720192 0.2817 0.438491 0.923222 32.7459
8 7 1 0.375957 0.155108 0.220847 0.884273 29.8152
16 15 1 0.248631 0.104707 0.143922 0.668557 35.9047
25 24 1 0.158468 0.0812083 0.0772578 0.671326 35.9047
$l_{max}$ $(N_{r},N_{\theta},N_{\phi})$
63 (124,96,192)
# of Cores # of Processes # of SMP Elapsed Nonlinear Solver Efficiency SUs
4 3 1 4.49747 0.942879 3.55459 1.23093 199.888
8 7 1 2.2319 0.59711 1.63479 1.24022 99.1957
16 15 1 1.38402 0.407052 0.97697 1 61.5122
32 31 1 0.765525 0.245143 0.52038 0.903971 68.0467
64 63 1 0.500593 0.152387 0.348203 0.691192 88.9943
126 125 1 0.300067 0.123445 0.176621 0.585699 106.69
$l_{max}$ $(N_{r},N_{\theta},N_{\phi})$
63 (128,96,192)
# of Cores # of Processes # of SMP Elapsed Nonlinear Solver Efficiency SUs
4 3 1 4.33759 0.982332 3.35526 1.40591 192.782
8 7 1 2.38804 0.611198 1.77684 1.27684 106.135
16 15 1 1.52457 0.473646 1.05092 1 67.7585
32 31 1 0.839238 0.257673 0.581563 0.908304 74.599
64 63 1 0.5472 0.179547 0.367651 0.696531 97.28
128 127 1 0.335634 0.148279 0.187353 0.567794 119.336


Elapsed (wall clock) time for the strong scaling. Ideal scaling is plotted by dotted line.


Parallel Efficiency for the strong scaling.

Weak Scaling Results

# of Cores # of Processes # of SMP $l_{max}$ $(N_{r},N_{\theta},N_{\phi})$ Elapsed Nonlinear Solver SUs
3 2 1 15 (124,24,48) 0.26251 0.0325999 0.229909 11.6671
9 8 1 31 (124,48,96) 0.383009 0.066082 0.316925 17.0226
32 31 1 63 (124,96,192) 0.765525 0.245143 0.52038 68.0467
125 124 1 127 (124,192,384) 1.55669 0.793595 0.763088 553.488


Elapsed time for the weak scaling in the horizontal resolution.

# of Cores # of Processes # of SMP $l_{max}$ $(N_{r},N_{\theta},N_{\phi})$ Elapsed Nonlinear Solver SUs
16 15 1 63 (30,96,192) 0.164901 0.117507 0.0473918 7.32893
32 31 1 63 (62,96,192) 0.238942 0.138128 0.100811 21.2393
64 63 1 63 (124,96,192) 0.500593 0.152387 0.348203 88.9943
127 126 1 63 (248,96,192) 1.19851 0.235588 0.962918 426.137


Elapsed time for the weak scaling in the radial resolution. Ideal scaling for linear solver after LU decomposition is plotted by dotted line.

Back to performance benchmark lists
files