wg:dynamo:performance_results:geofem

compile options
Definition of columns
Strong Scaling Results
Weak Scaling Results

compile options

F90OPTFLAGS = -O3 -warn all -g -xhost -openmp

Definition of columns

name
# of Cores	Number of used CPU cores
# of Processes	Number of MPI processes
# of Threads	Number of threads for each process
$N_{r}$	Nuber of nodes in radial direction
$N_{sph}$	Number of nodes in a sphere
Elapsed time	Elapsed (wall clock time) for one time step
Solver time	Elapsed (wall clock time) for linear solver (including communications)
Comm. time	Elapsed (wall clock time) for data communication
Efficiency	Parallel efficiency
SUs	Service unit for $10^{4}$ time steps (Core hours)

Strong Scaling Results

$N_{r}$	$N_{sph}$
65	31106

# of Cores	# of Processes	# of Threads	Elapsed time	Solver time	Comm. time	Efficiency	SUs
32	32	1	140.320	136.035	5.63721	0.993565	12472.9
64	64	1	62.3732	60.2832	5.83084	1.1176	11088.6
256	256	1	25.7511	22.0492	2.21876	0.676753	18311.9
32	16	2	139.417	134.658	2.90400	1.00	12392.6
64	32	2	61.0636	58.9420	2.62792	1.14157	10855.8
128	64	2	26.2981	25.2837	1.96978	1.32535	9350.44
256	128	2	18.6194	17.9958	8.38487	0.935966	13240.5
512	256	2	18.5565	18.1882	13.8237	0.469569	26391.5
32	8	4	141.713	134.061	8.63714	0.983798	12596.7
64	16	4	61.3219	58.9329	2.28762	1.13676	10901.7
128	32	4	26.7351	25.6872	1.89901	1.30369	9505.81
256	64	4	12.0594	11.5573	1.29915	1.44511	8575.57
512	128	4	6.50411	6.19388	0.930524	1.3397	9250.29
1024	256	4	3.61803	3.40549	1.27324	1.20419	10291.3
256	32	8	12.6620	12.1131	1.49653	1.37633	9004.09
512	64	8	6.31781	6.05579	1.326885	1.37921	8985.33
1024	128	8	4.00455	3.83623	1.61223	1.08796	11390.7
2048	256	8	2.24999	2.17732	1.06132	0.968178	12799.9

$N_{r}$	$N_{sph}$
129	124418

# of Cores	# of Processes	# of Threads	Elapsed time	Solver time	Comm. time	Efficiency	SUs
512	512	1	188.6315	185.9239	79.4327	0.703179	268276
512	256	2	155.7737	150.9981	48.0654	0.851503	221545
1024	512	2	116.1871	114.8309	68.0962	0.570811	330488
512	128	4	132.6418	131.1039	13.0843	1.00	188646
1024	256	4	62.7093	60.6450	14.2009	1.05759	178373
2048	512	4	29.1312	28.4619	7.7951	1.13831	165724
1024	128	8	60.5976	59.1307	7.7176	1.09445	172367
2048	256	8	31.4802	30.6008	8.3909	1.05337	179087
4096	512	8	17.7632	17.4155	7.1339	0.933403	202106
2048	128	16	129.3082	127.3400	78.7965	0.256445	735620
4096	256	16	84.6423	83.7060	60.9705	0.195886	963041

Elapsed (wall clock) time for the strong scaling. Number of OpenMP threads are shown by the numbers.

Parallel Efficiency for the strong scaling. Number of OpenMP threads are shown by the numbers.

Weak Scaling Results

N_r	N_sph
1	16

# of Cores	# of Processes	# of Threads	$N_{r}$	$N_{sph}$	Elapsed time	Solver time	Comm. time	iteration for $d{\bf A}/dT$	SUs
8	4	2	17	1946	1.27525	0.995820	0.051852	180.75	28.3389
16	4	4	33	1946	2.13157	1.89065	0.104110	229.25	94.7364
32	8	4	17	7778	2.91725	2.64922	0.176285	309.0	259.311
64	8	8	33	7778	3.29256	3.03684	0.356617	358.5	585.344
128	32	4	65	7778	4.02430	3.77557	0.680323	404.25	1430.86
256	32	8	33	31106	5.94088	5.67232	0.916604	661.7	4224.63
512	64	8	65	31106	6.31781	6.05579	1.32689	683.5	8985.33
1024	256	4	129	31106	9.44298	9.15932	2.59881	814.0	26860
2048	256	8	65	124418	15.0371	14.7224	4.4994	1354.1	85544.4
4096	512	8	129	124418	17.7632	17.4155	7.1339	1355.8	202106

Elapsed time for the weak scaling (red line). The best results among runs with the same number of cores are chosen for the plotting. Average iteration counts for $dA/Dt$ is also plotted with black line.

Back to performance benchmark lists

files

Table of Contents

compile options

Definition of columns

Strong Scaling Results

Weak Scaling Results