wg:dynamo:performance

compile options
Definition of columns
Two Processor Result
Strong Scaling Results
Weak Scaling Results

compile options

F90OPTFLAGS = -r8 -i4 -ftz -IPF_fma -IPF_fltacc -WB -O2

Definition of columns

name
# of Cores	Number of used CPU cores
# of Processes	Number of MPI processes
# of Threads	Number of threads for each process
$N_{c}$	Truncation lavel for Chebyshev polynomials
$l_{max}$	Truncation lavel for spherical harmonincs
$(N_{r},N_{\theta},N_{\phi})$	Nuber of grids in spherical coordinate
Elapsed	Elapsed (wall clock time) for one time step
Nonlinear	Elapsed (wall clock time) for nonlinear terms (including communications)
Solver	Elapsed (wall clock time) for linear calculation
Comm.	Elapsed (wall clock time) for data communication
Efficiency	Parallel efficiency
SUs	Service unit for $10^{4}$ time steps (Core hours)

Two Processor Result

$N_{c}$	$l_{max}$	$(N_{r},N_{\theta},N_{\phi})$	Elapsed	Nonlinear	Solver	Comm.	SU

Strong Scaling Results

$N_{c}$	$l_{max}$	$(N_{r},N_{\theta},N_{\phi})$
192	170	(193,256,256)

# of Cores	# of Processes	# of SMP	Elapsed	Nonlinear	Solver	Comm.	Efficiency	SUs
32	32	1	0.719657	0.518729	0.200927	0.182378	1	63.9695
64	64	1	0.545835	0.336805	0.209030	0.21079	0.659226	97.0373
128	128	1	0.285552	0.19960	0.085952	0.174127	0.630058	101.529

$N_{c}$	$l_{max}$	$(N_{r},N_{\theta},N_{\phi})$
256	341	(257,512,512)

# of Cores	# of Processes	# of SMP	Elapsed	Nonlinear	Solver	Comm.	Efficiency	SUs
86	86	1	2.47102	1.88142	0.589604	0.379399	1	658.939
128	128	1	2.32883	1.91045	0.418383	0.857392	0.712896	828.03
129	129	1	2.09683	1.68765	0.409184	0.663364	0.785635	838.734
256	256	1	1.41293	1.13385	0.279076	0.675334	0.58751	1004.75
257	257	1	1.33908	1.06534	0.273739	0.621596	0.61991	1011.75

Elapsed (wall clock) time for the strong scaling. Number of OpenMP threads are shown by the numbers. Ideal scaling is plotted by dotted line.

Parallel Efficiency for the strong scaling.

Weak Scaling Results

# of Cores	# of Processes	# of SMP	$N_{c}$	$l_{max}$	$(N_{r},N_{\theta},N_{\phi})$	Elapsed	Nonlinear	Solver	Comm.	SUs
4	4	1	256	42	(257,64,64)	0.322194	0.165516	0.156679	0.0525075	14.3197
16	16	1	256	85	(257,128,128)	0.406717	0.204322	0.202394	0.0716188	18.0763
64	64	1	256	170	(257,256,256)	0.743692	0.44779	0.295902	0.312471	132.212
256	256	1	256	341	(257,512,512)	1.41293	1.13385	0.279076	0.675334	1004.75

5	5	1	256	42	(257,64,64)	0.252394	0.127396	0.124998	0.0395718	11.2175
17	17	1	256	85	(257,128,128)	0.359973	0.180849	0.179124	0.0811399	31.9976
65	65	1	256	170	(257,256,256)	0.639472	0.363557	0.275915	0.227634	142.105
257	257	1	256	341	(257,512,512)	1.33908	1.06534	0.273739	0.621596	1011.75

Elapsed time for the weak scaling in the horizontal directions. Scaling of $O(Ncore^{1/2})$ (ideal scaling for Legendre transform) is plotted by dotted line.

# of Cores	# of Processes	# of SMP	$N_{c}$	$l_{max}$	$(N_{r},N_{\theta},N_{\phi})$	Elapsed	Nonlinear	Solver	Comm.	SUs
8	8	1	16	341	(17,512,512)	1.28171	1.20689	0.0748194	0.372245	56.9651
16	16	1	32	341	(33,512,512)	1.52159	1.42569	0.0958951	0.315524	67.6261
32	32	1	64	341	(65,512,512)	1.61663	1.45842	0.158208	0.389299	143.7
64	64	1	128	341	(129,512,512)	1.75436	1.50559	0.248772	0.453414	311.887
128	128	1	256	341	(257,512,512)	2.32883	1.91045	0.418383	0.857392	828.03

9	9	1	16	341	(17,512,512)	1.05288	0.993848	0.0590345	0.207514	46.7948
17	17	1	32	341	(33,512,512)	1.03108	0.94969	0.0813889	0.194925	91.6515
33	33	1	64	341	(65,512,512)	1.25166	1.10257	0.149091	0.218563	166.888
65	65	1	128	341	(129,512,512)	1.43519	1.19399	0.241198	0.262446	318.931
129	129	1	256	341	(257,512,512)	2.09683	1.68765	0.409184	0.663364	838.734

Elapsed time for the weak scaling in the radial direction. Scaling of $O(Ncore^{1/2})$ (ideal scaling for Legendre transform) is plotted by dotted line.

Back to performance benchmark lists
files

Table of Contents

compile options

Definition of columns

Two Processor Result

Strong Scaling Results

Weak Scaling Results