wg:dynamo:performance_results:goddard

compile options

F90OPTFLAGS = -O3 -xhost

More than two processes are required
Time stepping adjustment routine are implemented

name
# of Cores	Number of used CPU cores
# of Processes	Number of MPI processes
# of Threads	Number of threads for each process
$l_{max}$	Truncation lavel for spherical harmonincs
$(N_{r},N_{\theta},N_{\phi})$	Nuber of grids in spherical coordinate
Elapsed	Elapsed (wall clock time) for one time step
Nonlinear	Elapsed (wall clock time) for nonlinear terms (including communications)
Solver	Elapsed (wall clock time) for linear calculation
CFL	Elapsed (wall clock time) for CFL condition check
Efficiency	Parallel efficiency
SUs	Service unit for $10^{4}$ time steps (Core hours)

$l_{max}$	$(N_{r},N_{\theta},N_{\phi})$	Elapsed	Nonlinear	Solver	SUs
47	( 72,72,144)	1.32979	0.510249	0.819544	46.7169

$l_{max}$	$(N_{r},N_{\theta},N_{\phi})$
47	(72,72,144)

# of Cores	# of Processes	# of SMP	Elapsed	Nonlinear	Solver	Efficiency	SUs
2	1	1	1.32979	0.510249	0.819544	1	46.7169
4	3	1	0.720192	0.2817	0.438491	0.923222	32.7459
8	7	1	0.375957	0.155108	0.220847	0.884273	29.8152
16	15	1	0.248631	0.104707	0.143922	0.668557	35.9047
25	24	1	0.158468	0.0812083	0.0772578	0.671326	35.9047

$l_{max}$	$(N_{r},N_{\theta},N_{\phi})$
63	(124,96,192)

# of Cores	# of Processes	# of SMP	Elapsed	Nonlinear	Solver	Efficiency	SUs
4	3	1	4.49747	0.942879	3.55459	1.23093	199.888
8	7	1	2.2319	0.59711	1.63479	1.24022	99.1957
16	15	1	1.38402	0.407052	0.97697	1	61.5122
32	31	1	0.765525	0.245143	0.52038	0.903971	68.0467
64	63	1	0.500593	0.152387	0.348203	0.691192	88.9943
126	125	1	0.300067	0.123445	0.176621	0.585699	106.69

$l_{max}$	$(N_{r},N_{\theta},N_{\phi})$
63	(128,96,192)

# of Cores	# of Processes	# of SMP	Elapsed	Nonlinear	Solver	Efficiency	SUs
4	3	1	4.33759	0.982332	3.35526	1.40591	192.782
8	7	1	2.38804	0.611198	1.77684	1.27684	106.135
16	15	1	1.52457	0.473646	1.05092	1	67.7585
32	31	1	0.839238	0.257673	0.581563	0.908304	74.599
64	63	1	0.5472	0.179547	0.367651	0.696531	97.28
128	127	1	0.335634	0.148279	0.187353	0.567794	119.336

Elapsed (wall clock) time for the strong scaling. Ideal scaling is plotted by dotted line.

Parallel Efficiency for the strong scaling.

# of Cores	# of Processes	# of SMP	$l_{max}$	$(N_{r},N_{\theta},N_{\phi})$	Elapsed	Nonlinear	Solver	SUs
3	2	1	15	(124,24,48)	0.26251	0.0325999	0.229909	11.6671
9	8	1	31	(124,48,96)	0.383009	0.066082	0.316925	17.0226
32	31	1	63	(124,96,192)	0.765525	0.245143	0.52038	68.0467
125	124	1	127	(124,192,384)	1.55669	0.793595	0.763088	553.488

Elapsed time for the weak scaling in the horizontal resolution.

# of Cores	# of Processes	# of SMP	$l_{max}$	$(N_{r},N_{\theta},N_{\phi})$	Elapsed	Nonlinear	Solver	SUs
16	15	1	63	(30,96,192)	0.164901	0.117507	0.0473918	7.32893
32	31	1	63	(62,96,192)	0.238942	0.138128	0.100811	21.2393
64	63	1	63	(124,96,192)	0.500593	0.152387	0.348203	88.9943
127	126	1	63	(248,96,192)	1.19851	0.235588	0.962918	426.137

Elapsed time for the weak scaling in the radial resolution. Ideal scaling for linear solver after LU decomposition is plotted by dotted line.