wg:dynamo:performance_results:gary

compile options

F90OPTFLAGS = -r8 -i4 -ftz -IPF_fma -IPF_fltacc -WB -O3 -xhosts

name
# of Cores	Number of used CPU cores
# of Processes	Number of MPI processes
# of Threads	Number of threads for each process
$N_{c}$	Truncation lavel for Chebyshev polynomials
$l_{max}$	Truncation lavel for spherical harmonincs
$(N_{r},N_{\theta},N_{\phi})$	Nuber of grids in spherical coordinate
Elapsed	Elapsed (wall clock time) for one time step
Legendre	Elapsed (wall clock time) for Legendre transform
Implicit	Elapsed (wall clock time) for linear calculation
Efficiency	Parallel efficiency
SUs	Service unit for $10^{4}$ time steps (Core hours)

$N_{c}$	$l_{max}$	$(N_{r},N_{\theta},N_{\phi})$	Elapsed	Legendre	Implicit	LUdecomp	SUs
71	47	(73,72,144)	0.96659	0.57970	0.13313	0.010014	2.6849

$N_{c}$	$l_{max}$	$(N_{r},N_{\theta},N_{\phi})$
191	255	(193,384,768)

# of Cores	# of Processes	# of SMP	Elapsed	Legendre	Implicit	Efficiency	SUs
16	16	1	7.8559	3.1132	1.0993	1.0	349.151
32	32	1	4.4581	1.5484	0.67073	0.881082	396.276
64	64	1	3.4032	0.77098	0.68621	0.577097	605.013
128	128	1	1.0696	0.37643	0.14921	0.918089	380.302

$N_{c}$	$l_{max}$	$(N_{r},N_{\theta},N_{\phi})$
255	511	(257,768,1536)

# of Cores	# of Processes	# of SMP	Elapsed	Legendre	Implicit	Efficiency	SUs
64	64	1	13.018	4.7327	1.9132	1.0	414.015
128	128	1	8.7973	2.3534	1.7398	0.555322	745.541
256	256	1	8.678	1.1378	4.3325	0.412058	1004.75

Elapsed (wall clock) time for the strong scaling. Number of OpenMP threads are shown by the numbers. Ideal scaling is plotted by dotted line.

Parallel Efficiency for the strong scaling.

# of Cores	# of Processes	# of SMP	$N_{c}$	$l_{max}$	$(N_{r},N_{\theta},N_{\phi})$	Elapsed	Legendre	Implicit	SUs
2	2	1	255	63	(257,96,192)	3.3257	1.8103	0.47442	147.809
8	8	1	255	127	(257,192,384)	4.0801	1.8754	0.51211	181.338
32	32	1	255	255	(257,384,768)	5.8172	2.0489	0.8497	517.084
128	128	1	255	511	(257,768,1536)	9.5023	2.3534	1.7398	3378.6

3	3	1	255	63	(257,96,192)	2.3095	1.1811	0.3029	102.644
9	9	1	255	127	(257,192,384)	3.5408	1.7574	0.46589	157.369
33	33	1	255	255	(257,384,768)	5.528	1.7904	0.77925	737.067
129	129	1	255	511	(257,768,1536)	8.9802	1.9042	1.7902	3592.1

Elapsed time for the weak scaling in the horizontal resolution. Scaling of $O(Ncore^{1/2})$ is plotted by dotted line.

# of Cores	# of Processes	# of SMP	$N_{c}$	$l_{max}$	$(N_{r},N_{\theta},N_{\phi})$	Elapsed	Legendre	Implicit	SUs
16	16	1	31	511	(33,768,1536)	5.1154	2.5566	0.49331	227.351
32	32	1	63	511	(65,768,1536)	6.0620	2.4221	0.76595	538.844
64	64	1	127	511	(129,768,1536)	6.9065	2.3967	1.1075	1227.82
128	128	1	255	511	(257,768,1536)	9.5023	2.3534	1.7398	3378.6

17	17	1	31	511	(33,768,1536)	4.2943	2.4266	0.40772	381.716
33	33	1	63	511	(65,768,1536)	5.6936	2.2617	0.67252	759.147
65	65	1	127	511	(129,768,1536)	6.7351	2.1191	1.0571	1496.69
129	129	1	255	511	(257,768,1536)	8.9802	1.9042	1.7902	3592.08

Elapsed time for the weak scaling in the radial resolution. Scaling of $O(Ncore)$ is plotted by dotted line.