wg:dynamo:performance_results:calypso

compile options

F90OPTFLAGS = -O3 -warn all -g -xhost -openmp

name
# of Cores	Number of used CPU cores
# of Processes	Number of MPI processes
# of Threads	Number of threads for each process
$l_{max}$	Truncation lavel for spherical harmonincs
$(N_{r},N_{\theta},N_{\phi})$	Nuber of grids in spherical coordinate
Elapsed	Elapsed (wall clock time) for one time step
Nonlinear	Elapsed (wall clock time) for nonlinear terms (including communications)
Solver	Elapsed (wall clock time) for linear calculation
Comm.	Elapsed (wall clock time) for data communication
Efficiency	Parallel efficiency
SUs	Service unit for $10^{4}$ time steps (Core hours)

$l_{max}$	$(N_{r},N_{\theta},N_{\phi})$	Elapsed	Nonlinear	Solver	Comm.	SU
47	( 73,72,144)	1.604797	1.56274	0.042059	0.508469	71.3243

$l_{max}$	$(N_{r},N_{\theta},N_{\phi})$
255	(513,384,768)

# of Cores	# of Processes	# of SMP	Elapsed	Nonlinear	Solver	Comm.	Efficiency	SUs
256	32	8	7.25365	7.20543	0.0482177	1.09003	0.940665	5158.15
256	64	4	6.82326	6.7743	0.0489558	0.794641	1	4852.09
256	128	2	6.74711	6.69947	0.0476359	0.72289	1.01129	4797.94
512	64	8	3.57915	3.5559	0.0232448	0.541038	0.953195	5090.34
512	128	4	2.13608	2.11468	0.0214005	0.481005	1.59714	3037.98
512	256	2	2.11996	2.09772	0.0222343	0.430132	1.60929	3015.06
1024	128	8	1.76799	1.75701	0.010974	0.300582	0.964834	5028.94
1024	256	4	1.33388	1.32335	0.01052	0.514778	1.27884	3794.14
1024	512	2	2.31397	2.30589	0.00807709	1.36341	0.73718	6581.96
2048	256	8	0.838836	0.833227	0.00560799	0.195589	1.01677	4772.05
2048	512	4	0.81061	0.805883	0.0047257	0.168418	1.05218	4611.47
2048	1024	2	0.672909	0.668731	0.00417642	0.162079	1.26749	3828.1
4096	512	8	0.386554	0.383808	0.00274551	0.101968	1.10322	4398.13
4096	1024	4	0.3577	0.355273	0.00242621	0.105497	1.19221	4069.83

Elapsed (wall clock) time for the strong scaling. Number of OpenMP threads are shown by the numbers. Ideal scaling is plotted by dotted line.

Parallel Efficiency for the strong scaling. Number of OpenMP threads are shown by the numbers.

# of Cores	# of Processes	# of SMP	$l_{max}$	$(N_{r},N_{\theta},N_{\phi})$	Elapsed	Nonlinear	Solver	Comm.	SUs
16	4	4	31	(513,48,96)	0.500935	0.489675	0.0112569	0.129805	22.2638
64	16	4	63	(513,96,192)	0.60037	0.591241	0.00912827	0.21034	106.733
256	64	4	127	(513,192,384)	0.730134	0.72047	0.00966272	0.203059	519.207
1024	256	4	255	(513,384,768)	1.33388	1.32335	0.01052	0.514778	3794.14
4096	1024	4	511	(513,768,1536)	1.94168	1.93247	0.00921192	0.57891	22092

Elapsed time for the weak scaling in the horizontal resolution. The results with 4 OpenMP threads are shown.

# of Cores	# of Processes	# of SMP	$l_{max}$	$(N_{r},N_{\theta},N_{\phi})$	Elapsed	Nonlinear	Solver	Comm.	SUs
128	32	4	255	(33,384,768)	0.536611	0.53181	0.00479917	0.718157	190.795
256	64	4	255	(65,384,768)	0.69748	0.69268	0.00479858	0.171752	495.986
512	128	4	255	(129,384,768)	0.694585	0.0689725	0.00444602	0.720039	987.854
1024	256	4	255	(257,384,768)	0.809201	0.804243	0.0049558	0.203059	2301.73
2048	512	4	255	(513,384,768)	0.81061	0.805883	0.0047257	0.168418	4611.47
4096	1024	4	255	(1025,384,768)	0.809201	0.804471	0.00472797	0.17441	9206.91

Elapsed time for the weak scaling in the radial resolution. The results with 4 OpenMP threads are shown.