User Tools

Site Tools


wg:dynamo:performance_results:magic5

Back to performance benchmark lists

compile options

F90OPTFLAGS = -O3 -warn all -g -xhost -openmp

Definition of columns

name
# of Cores Number of used CPU cores
# of Processes Number of MPI processes
# of Threads Number of threads for each process
$N_{C}$ Truncation lavel for Chebyshev polynomials
$l_{max}$ Truncation lavel for spherical harmonincs
$(N_{r},N_{\theta},N_{\phi})$ Nuber of grids in spherical coordinate
Elapsed Elapsed (wall clock time) for one time step
Nonlinear Elapsed (wall clock time) for nonlinear terms (including communications)
Solver Elapsed (wall clock time) for linear calculation
Comm. Elapsed (wall clock time) for data communication
Efficiency Parallel efficiency
SUs Service unit for $10^{4}$ time steps (Core hours)

Single Processor Result

$N_{C}$ $l_{max} $ $(N_{r},N_{\theta},N_{\phi})$ Elapsed Nonlinear Solver Comm. SU
72 47 ( 73,72,144) 0.353396 0.249238 0.965605 0.00720803 15.7065

Strong Scaling Results

$N_{C}$ $l_{max}$ $(N_{r},N_{\theta},N_{\phi})$
192 255 (192,384,768)
# of Cores # of Processes # of SMP Elapsed Nonlinear Solver Comm. Efficiency SUs
8 2 4 9.98327 6.70605 2.44661 0.413657 1.3892 221.85
8 4 2 9.21265 6.62681 2.19377 0.230439 1.5054 204.726
16 2 8 7.94273 5.06533 2.13615 0.538236 0.873047 353.01
16 4 4 6.93438 4.67925 1.80959 0.298254 1 308.195
16 8 2 7.12601 5.21069 1.70288 0.245879 0.973108 316.712
32 4 8 3.91464 2.45256 1.01179 0.220971 0.885697 347.968
32 8 4 3.65146 2.36565 0.96631 0.243386 0.949535 324.574
32 16 2 3.76134 2.62796 0.861258 0.165907 0.921796 334.342
64 8 8 2.01431 1.22438 0.465491 0.146756 0.860641 358.099
64 16 4 1.942 1.18073 0.465075 0.188362 0.892685 345.245
64 32 2 2.0975 1.33229 0.360439 0.108833 0.826505 372.889
128 16 8 1.15653 0.633651 0.241426 0.119156 0.749479 411.212
128 32 4 1.0925 0.591954 0.182997 0.0788043 0.793407 388.445
128 64 2 1.46869 0.662761 0.175857 0.0781756 0.590182 522.202
256 32 8 0.713037 0.318539 0.103698 0.0459037 0.607821 507.049
256 64 4 0.799634 0.298149 0.0985314 0.0466445 0.541996 568.629
512 64 8 0.597079 0.156069 0.0687971 0.0383883 0.362932 849.18


Elapsed (wall clock) time for the strong scaling. Number of OpenMP threads are shown by the numbers. Ideal scaling is plotted by dotted line.


Parallel Efficiency for the strong scaling. Fastest result with 16 cores (one node) is chosen for a reference. Number of OpenMP threads are shown by the numbers.

Weak Scaling Results

# of Cores # of Processes # of SMP $N_{C}$ $l_{max}$ $(N_{r},N_{\theta},N_{\phi})$ Elapsed Nonlinear Solver Comm. SUs
4 1 4 256 31 (257,48,96) 0.239627 0.117565 0.108877 0.01385 22.2638
16 4 4 256 31 (257,96,192) 0.265171 0.128896 0.0825426 0.000955493 22.2638
64 16 4 256 63 (257,192,384) 0.410822 0.223191 0.110366 0.0318171 106.733
256 64 4 256 127 (257,384,768) 1.06998 0.398631 0.134563 0.0538146 519.207


Elapsed time for the weak scaling in the horizontal resolution. The results with 4 OpenMP threads are shown. Scaling of $O(Ncore^{1/2})$ (ideal scaling for Legendre transform) is plotted by dotted line.


# of Cores # of Processes # of SMP $N_{C}$ $l_{max}$ $(N_{r},N_{\theta},N_{\phi})$ Elapsed Nonlinear Solver Comm. SUs
32 8 4 32 255 (33,384,768) 0.525404 0.389331 0.0906778 0.0499037 46.7025
64 16 4 64 255 (65,384,768) 0.586558 0.396212 0.100678 0.0558089 104.277
128 32 4 128 255 (129,384,768) 0.694737 0.396308 0.105363 0.0500511 247.018
256 64 4 256 255 (257,384,768) 1.06998 0.398631 0.134563 0.0538146 760.877


Elapsed time for the weak scaling in the radial resolution. The results with 4 OpenMP threads are shown. Scaling of $O(Ncore)$ is plotted by dotted line.

Back to performance benchmark lists

files

wg/dynamo/performance_results/magic5.txt · Last modified: 2018/11/28 21:56 (external edit)