wg:dynamo:performance

Compile options

F90OPTFLAGS = -O3 -xhost

name
# of Cores	Number of used CPU cores
# of Processes	Number of MPI processes
# of Threads	Number of threads for each process
$l_{max}$	Truncation lavel for spherical harmonincs
$(N_{r},N_{\theta},N_{\phi})$	Nuber of grids in spherical coordinate
Elapsed	Elapsed (wall clock time) for one time step
Nonlinear	Elapsed (wall clock time) for nonlinear terms (including communications)
Solver	Elapsed (wall clock time) for linear calculation
Comm.	Elapsed (wall clock time) for data communication
Efficiency	Parallel efficiency
SUs	Service unit for $10^{4}$ time steps (Core hours)

$l_{max}$	$(N_{r},N_{\theta},N_{\phi})$	Elapsed	Nonlinear	Solver	Comm.	SU
47	( 73,72,144)	1.604797	1.56274	0.042059	0.508469	71.3243

$l_{max}$	$(N_{r},N_{\theta},N_{\phi})$
47	(48,72,144)

# of Cores	# of Processes	# of SMP	Elapsed	Nonlinear	Solver	Comm.	Efficiency	SUs
1	1	1	1.04900	0.880913	0.168083	0.360225	1.0	2.91389
2	2	1	0.538092	0.453756	0.0843343	0.194296	0.97474	2.9894
4	4	1	0.274424	0.23125	0.0431727	0.0996035	0.955636	3.04916
8	8	1	0.145301	0.122894	0.0224057	0.0558862	0.902437	3.22891
16	16	1	0.095041	0.0821946	0.0128446	0.0449899	0.689833	4.22404

$l_{max}$	$(N_{r},N_{\theta},N_{\phi})$
127	(256,192,384)

# of Cores	# of Processes	# of SMP	Elapsed	Nonlinear	Solver	Comm.	Efficiency	SUs
4	4	1	17.9454	16.918	1.02739	6.01468	1.45157	1749.79
8	8	1	9.98835	9.45138	0.536971	3.40571	1.30397	1749.79
16	16	1	6.51225	6.2059	0.00297473	2.40569	1.0	1749.79
32	32	1	3.30412	3.15372	0.00297473	1.23981	0.985473	1749.79
64	64	1	1.71612	1.64141	0.00297473	0.6747	0.948685	1749.79
128	128	1	0.91125	0.870656	0.00297473	0.383074	0.893313	1749.79

$l_{max}$	$(N_{r},N_{\theta},N_{\phi})$
255	(513,384,768)

# of Cores	# of Processes	# of SMP	Elapsed	Nonlinear	Solver	Comm.	Efficiency	SUs
128	128	1	10.4146	10.0946	0.320031	4.01002	1.0	3702.97
256	256	1	5.4918	5.33942	0.152376	2.2764	0.948195	3905.28

Elapsed (wall clock) time for the strong scaling. Ideal scaling is plotted by dotted line.

Parallel Efficiency for the strong scaling.

# of Cores	# of Processes	# of SMP	$l_{max}$	$(N_{r},N_{\theta},N_{\phi})$	Elapsed	Nonlinear	Solver	Comm.	SUs
4	4	1	31	(256,48,96)	0.367794	0.302528	0.0652638	0.13866	4.0866
16	4	1	63	(256,96,192)	0.829839	0.757857	0.0719786	0.333987	36.8817
64	16	1	127	(256,192,384)	1.71612	1.64141	0.0747133	0.674700	305.089
256	64	1	255	(256,384,768)	2.74791	2.67373	0.0741758	1.13636	1954.07

Elapsed time for the weak scaling in the horizontal resolution. An ideal scaling for Legendre transform is plotted by dotted line.

# of Cores	# of Processes	# of SMP	$l_{max}$	$(N_{r},N_{\theta},N_{\phi})$	Elapsed	Nonlinear	Solver	Comm.	SUs
32	32	1	255	(64,384,768)	5.15760	4.98965	0.167950	1.90546	458.454
64	64	1	255	(128,384,768)	5.15654	4.99703	0.159511	1.92557	916.718
128	128	1	255	(256,384,768)	5.30425	5.14686	0.157383	2.07861	1885.96
256	256	1	255	(512,384,768)	5.49180	5.33942	0.152376	2.2764	3905.28

Elapsed time for the weak scaling in the radial resolution. An ideal scaling is a constant elapsed time.