wg:dynamo:performance_results:magic5

compile options
Definition of columns
Single Processor Result
Strong Scaling Results
Weak Scaling Results

compile options

F90OPTFLAGS = -O3 -warn all -g -xhost -openmp

Definition of columns

name
# of Cores	Number of used CPU cores
# of Processes	Number of MPI processes
# of Threads	Number of threads for each process
$N_{C}$	Truncation lavel for Chebyshev polynomials
$l_{max}$	Truncation lavel for spherical harmonincs
$(N_{r},N_{\theta},N_{\phi})$	Nuber of grids in spherical coordinate
Elapsed	Elapsed (wall clock time) for one time step
Nonlinear	Elapsed (wall clock time) for nonlinear terms (including communications)
Solver	Elapsed (wall clock time) for linear calculation
Comm.	Elapsed (wall clock time) for data communication
Efficiency	Parallel efficiency
SUs	Service unit for $10^{4}$ time steps (Core hours)

Single Processor Result

$N_{C}$	$l_{max} $	$(N_{r},N_{\theta},N_{\phi})$	Elapsed	Nonlinear	Solver	Comm.	SU
72	47	( 73,72,144)	0.353396	0.249238	0.965605	0.00720803	15.7065

Strong Scaling Results

$N_{C}$	$l_{max}$	$(N_{r},N_{\theta},N_{\phi})$
192	255	(192,384,768)

# of Cores	# of Processes	# of SMP	Elapsed	Nonlinear	Solver	Comm.	Efficiency	SUs
8	2	4	9.98327	6.70605	2.44661	0.413657	1.3892	221.85
8	4	2	9.21265	6.62681	2.19377	0.230439	1.5054	204.726
16	2	8	7.94273	5.06533	2.13615	0.538236	0.873047	353.01
16	4	4	6.93438	4.67925	1.80959	0.298254	1	308.195
16	8	2	7.12601	5.21069	1.70288	0.245879	0.973108	316.712
32	4	8	3.91464	2.45256	1.01179	0.220971	0.885697	347.968
32	8	4	3.65146	2.36565	0.96631	0.243386	0.949535	324.574
32	16	2	3.76134	2.62796	0.861258	0.165907	0.921796	334.342
64	8	8	2.01431	1.22438	0.465491	0.146756	0.860641	358.099
64	16	4	1.942	1.18073	0.465075	0.188362	0.892685	345.245
64	32	2	2.0975	1.33229	0.360439	0.108833	0.826505	372.889
128	16	8	1.15653	0.633651	0.241426	0.119156	0.749479	411.212
128	32	4	1.0925	0.591954	0.182997	0.0788043	0.793407	388.445
128	64	2	1.46869	0.662761	0.175857	0.0781756	0.590182	522.202
256	32	8	0.713037	0.318539	0.103698	0.0459037	0.607821	507.049
256	64	4	0.799634	0.298149	0.0985314	0.0466445	0.541996	568.629
512	64	8	0.597079	0.156069	0.0687971	0.0383883	0.362932	849.18

Elapsed (wall clock) time for the strong scaling. Number of OpenMP threads are shown by the numbers. Ideal scaling is plotted by dotted line.

Parallel Efficiency for the strong scaling. Fastest result with 16 cores (one node) is chosen for a reference. Number of OpenMP threads are shown by the numbers.

Weak Scaling Results

# of Cores	# of Processes	# of SMP	$N_{C}$	$l_{max}$	$(N_{r},N_{\theta},N_{\phi})$	Elapsed	Nonlinear	Solver	Comm.	SUs
4	1	4	256	31	(257,48,96)	0.239627	0.117565	0.108877	0.01385	22.2638
16	4	4	256	31	(257,96,192)	0.265171	0.128896	0.0825426	0.000955493	22.2638
64	16	4	256	63	(257,192,384)	0.410822	0.223191	0.110366	0.0318171	106.733
256	64	4	256	127	(257,384,768)	1.06998	0.398631	0.134563	0.0538146	519.207

Elapsed time for the weak scaling in the horizontal resolution. The results with 4 OpenMP threads are shown. Scaling of $O(Ncore^{1/2})$ (ideal scaling for Legendre transform) is plotted by dotted line.

# of Cores	# of Processes	# of SMP	$N_{C}$	$l_{max}$	$(N_{r},N_{\theta},N_{\phi})$	Elapsed	Nonlinear	Solver	Comm.	SUs
32	8	4	32	255	(33,384,768)	0.525404	0.389331	0.0906778	0.0499037	46.7025
64	16	4	64	255	(65,384,768)	0.586558	0.396212	0.100678	0.0558089	104.277
128	32	4	128	255	(129,384,768)	0.694737	0.396308	0.105363	0.0500511	247.018
256	64	4	256	255	(257,384,768)	1.06998	0.398631	0.134563	0.0538146	760.877

Elapsed time for the weak scaling in the radial resolution. The results with 4 OpenMP threads are shown. Scaling of $O(Ncore)$ is plotted by dotted line.

Back to performance benchmark lists

files

Table of Contents

compile options

Definition of columns

Single Processor Result

Strong Scaling Results

Weak Scaling Results