wg:dynamo:performance_results:sfemans

compile options
Definition of columns
Strong Scaling Results
Weak Scaling Results

compile options

F90OPTFLAGS = -O3 -g -xhost

Definition of columns

name
# of Cores	Number of used CPU cores
# of parallel FEM	Number of subdomain in meridional plane
# of parallel FFT	Number of parallelization for FFT
$N_{med}$	Number of nodes for fluid in a meridional plane
$N_{\phi}$	Number of nodes (modes) in longitudinal direction
Elapsed time	Elapsed (wall clock time) for one time step
Solver time	Elapsed (wall clock time) for linear solver (including communications)
Comm. time	Elapsed (wall clock time) for data communication
Efficiency	Parallel efficiency
SUs	Service unit for $10^{4}$ time steps (Core hours)

Elapsed time is evaluated by averaging over 100 steps and number of cores from “fort.702”
Solver time is evaluated by averaging over 100 steps and number of cores from “fort.705”
Comm. time is evaluated by averaging over 100 steps and number of cores from “fort.703”

Strong Scaling Results

$N_{med}$	$N_{\phi}$
53280	32

# of Cores	# of parallel FEM	# of parallel FFT	Elapsed time	Solver time	Comm. time	Efficiency	SUs
64	8	8	3.26451	0.112633	0.0521764	0.958659	580.357
64	16	4	5.39355	0.134547	0.0933418	0.580239	958.854
128	8	16	1.85212	0.108784	0.0344249	0.844854	658.533
128	16	8	1.84533	0.0709971	0.0290995	0.847966	656.116
256	8	32	1.18584	0.427434	0.0484521	0.659775	843.263
256	16	16	1.04266	0.0631451	0.023694	0.750375	741.448
256	32	8	1.20359	0.0506269	0.0211725	0.650045	855.886
512	16	32	0.707405	0.248459	0.0314082	0.552998	1006.09
512	32	16	0.732251	0.0445155	0.0205402	0.534234	1041.42
512	64	8	0.649872	0.119073	0.00713214	0.601955	924.262

$N_{med}$	$N_{\phi}$
132587	128

# of Cores	# of parallel FEM	# of parallel FFT	Elapsed time	Solver time	Comm. time	Efficiency	SUs
512	32	16	5.347	0.195319	0.162323	0.852482	7604.62
512	16	32	4.55822	0.225011	0.18032	1	6482.8
1024	64	16	3.99642	0.162858	0.117593	0.570288	11367.6
1024	32	32	2.30393	0.126079	0.107139	0.989228	6553.4
1024	16	64	3.2378	0.214095	0.144102	0.703906	9209.75
2048	64	32	1.40748	0.0818107	0.0677741	0.809641	8007.01
2048	32	64	1.51652	0.111759	0.0867726	0.75143	8627.29
2048	16	128	2.39379	0.338092	0.107656	0.476046	13618
4096	128	32	1.02445	0.0618993	0.0500832	0.55618	11655.9
4096	64	64	0.954898	0.0737486	0.0606541	0.59669	10864.6
4096	32	128	1.12072	0.180406	0.0659313	0.508405	12751.3

Elapsed (wall clock) time for the strong scaling. Number of parallelization for FFT is shown by the numbers.

Parallel Efficiency for the strong scaling. Number of parallelization for FFT is shown by the numbers.

Weak Scaling Results

# of Cores	# of parallel FEM	# of parallel FFT	$N_{med}$	$N_{\phi}$	Elapsed time	Solver time	Comm. time	SUs
128	32	4	132587	16	4.45475	0.100929	0.0287522	1583.91
256	32	8	132587	32	2.82371	0.101547	0.0439264	2007.97
512	32	16	132587	64	2.41257	0.117023	0.0525979	3431.22
1024	32	32	132587	128	2.30393	0.126079	0.107139	6553.4
2048	32	64	132587	256	2.39006	0.126997	0.149923	13596.8

Elapsed time for the weak scaling in the zonal direction.

# of Cores	# of parallel FEM	# of parallel FFT	$N_{med}$	$N_{\phi}$	Elapsed time	Solver time	Comm. time	SUs
256	16	16	7620	64	0.498492	0.0331437	0.0233114	354.484
256	64	16	30667	64	0.734665	0.0432489	0.0237405	2089.71
2304	144	16	67412	64	1.05416	0.0557186	0.0527788	6746.65
4096	256	16	119590	64	1.26638	0.0689046	0.0430469	14408.5

Elapsed time for the weak scaling in the meridional directions. Scaling of O(Ncore1/2) is plotted by dotted line.

Back to performance benchmark lists
files

Table of Contents

compile options

Definition of columns

Strong Scaling Results

Weak Scaling Results