User Tools

Site Tools


wg:dynamo:performance_results:calypso_latest

Back to performance benchmark lists

Compile options

F90OPTFLAGS = -O3 -warn all -g -xhost -openmp

Definition of columns

name
# of Cores Number of used CPU cores
# of Processes Number of MPI processes
# of Threads Number of threads for each process
$l_{max}$ Truncation lavel for spherical harmonincs
$(N_{r},N_{\theta},N_{\phi})$ Nuber of grids in spherical coordinate
Elapsed Elapsed (wall clock time) for one time step
Nonlinear Elapsed (wall clock time) for nonlinear terms (including communications)
Solver Elapsed (wall clock time) for linear calculation
Comm. Elapsed (wall clock time) for data communication
Efficiency Parallel efficiency
SUs Service unit for $10^{4}$ time steps (Core hours)

Single Processor Result

$l_{max}$ $(N_{r},N_{\theta},N_{\phi})$ Elapsed Nonlinear Solver Comm. SU
47 ( 73,72,144) 1.604797 1.56274 0.042059 0.508469 71.3243

Strong Scaling Results

$l_{max}$ $(N_{r},N_{\theta},N_{\phi})$
255 (513,384,768)
# of Cores # of Processes # of SMP Elapsed Nonlinear Solver Comm. Efficiency SUs
256 32 8 2.30595 2.25802 0.0409069 0.632535 1 1639.78
256 64 4
256 128 2
512 64 8 1.09925 1.07626 0.0229863 0.294752 0.0104887 1563.38
512 128 4 1.08382 1.06144 0.0223762 0.352703 0.010638 1541.44
512 256 2 0.950071 0.928156 0.0219641 0.217071 0.0121356 1351.21
1024 128 8 0.574765 0.563144 0.0116204 0.180746 0.00651564 1634.89
1024 256 4 0.523838 0.513385 0.0104525 0.175344 0.011005 1490.03
1024 512 2 0.515151 0.504822 0.0103284 0.162658 0.0111906 1465.32
2048 256 8 0.295707 0.290006 0.00570012 0.112858 0.00974759 1682.25
2048 512 4 0.276864 0.271508 0.00535592 0.118606 0.010411 1575.05
2048 1024 2 0.261681 0.256766 0.00491518 0.103864 0.011015 1488.68
4096 512 8 0.15379 0.150815 0.00297473 0.187843 0.00937133 1749.79
4096 1024 4 0.154584 0.151774 0.00280978 0.0897712 0.00932318 1758.83


Elapsed (wall clock) time for the strong scaling. Number of OpenMP threads are shown by the numbers. Ideal scaling is plotted by dotted line.


Parallel Efficiency for the strong scaling. Number of OpenMP threads are shown by the numbers.

Weak Scaling Results

# of Cores # of Processes # of SMP $l_{max}$ $(N_{r},N_{\theta},N_{\phi})$ Elapsed Nonlinear Solver Comm. SUs
16 4 4 31 (513,48,96) 0.345327 0.334976 0.0103503 0.0346996 15.3479
64 16 4 63 (513,96,192) 0.377983 0.367511 0.0104712 0.0701478 67.197
256 64 4 127 (513,192,384) 0.506746 0.496344 0.0104008 0.215548 360.352
1024 256 4 255 (513,768,1536) 0.523838 0.513385 0.0104525 0.175344 1490.03
4096 1024 4 511 (513,768,1536) 0.744473 0.733799 0.010673 0.386788 8470.44


Elapsed time for the weak scaling in the horizontal resolution. The results with 4 OpenMP threads are shown. An ideal scaling for Legendre transform is plotted by dotted line.

# of Cores # of Processes # of SMP $l_{max}$ $(N_{r},N_{\theta},N_{\phi})$ Elapsed Nonlinear Solver Comm. SUs
128 32 4 255 (33,384,768) 0.253204 0.248189 0.00501417 0.100608 90.0281
256 64 4 255 (65,384,768) 0.261203 0.256194 0.00500897 0.0883585 185.744
512 128 4 255 (129,384,768) 0.266168 0.261061 0.00510643 0.0962178 378.549
1024 256 4 255 (257,384,768) 0.303394 0.298234 0.00515566 0.145423 760.043
2048 512 4 255 (513,384,768) 0.276864 0.271508 0.00535592 0.118606 1575.05
4096 1024 4 255 (1025,384,768) 0.279425 0.27406 0.00536459 0.127257 3179.23


Elapsed time for the weak scaling in the radial resolution. The results with 4 OpenMP threads are shown.

Back to performance benchmark lists

files

wg/dynamo/performance_results/calypso_latest.txt · Last modified: 2018/11/28 21:53 (external edit)