User Tools

Site Tools


wg:dynamo:performance_results:rayleigh

Back to performance benchmark lists

Compile options

F90FLAGS = -FR -fpp -r8 -O3 -xAVX -shared_intel -I$(MKLROOT)/include -I$(MKLROOT)/include/fftw

Definition of columns

name
# of Cores Number of used CPU cores
# of Processes Number of MPI processes
# of Threads Number of threads for each process
$l_{max}$ Truncation lavel for spherical harmonincs
$N_{C}$ Truncation lavel for Chebyshev polynomials
$(N_{r},N_{\theta},N_{\phi})$ Nuber of grids in spherical coordinate
Elapsed Elapsed (wall clock time) for one time step
Nonlinear Elapsed (wall clock time) for nonlinear terms (including communications)
Solver Elapsed (wall clock time) for linear calculation
Comm. Elapsed (wall clock time) for data communication
Efficiency Parallel efficiency
SUs Service unit for $10^{4}$ time steps (Core hours)

Single Processor Result

$N_{C}$ $l_{max}$ $(N_{r},N_{\theta},N_{\phi})$ Elapsed Nonlinear Solver Comm. SU
48 47 ( 73,72,144) 1.604797 1.56274 0.042059 0.508469 71.3243

Strong Scaling Results

$N_{C}$ $l_{max}$ $(N_{r},N_{\theta},N_{\phi})$
255 (512,384,768)
# of Cores # of Processes # of SMP Elapsed Nonlinear Solver Comm. Efficiency SUs
64 64 1 5.9551 2.97054 0.77414 2.21042 1 1058.68
128 128 1 3.12294 1.67361 0.436457 1.01287 0.953444 1110.38
256 256 1 1.49993 0.793621 0.224334 0.481976 0.992562 1066.62
512 512 1 0.901664 0.441342 0.145236 0.315087 0.82557 1282.37
1024 1024 1 0.457387 0.219177 0.0752801 0.16293 0.813738 1301.01
2048 2048 1 0.322376 0.136652 0.0529017 0.132822 0.577267 1833.96
4096 4096 1 0.185785 0.0691138 0.0287718 0.0878993 0.500839 2113.82


Elapsed (wall clock) time for the strong scaling. Number of OpenMP threads are shown by the numbers. Ideal scaling is plotted by dotted line.


Parallel Efficiency for the strong scaling. Number of OpenMP threads are shown by the numbers.

Weak Scaling Results

# of Cores # of Processes # of SMP $l_{max}$ $(N_{r},N_{\theta},N_{\phi})$ Elapsed Nonlinear Solver Comm. SUs
16 4 4 31 (513,48,96) 0.345327 0.334976 0.0103503 0.0346996 15.3479
64 16 4 63 (513,96,192) 0.377983 0.367511 0.0104712 0.0701478 67.197
256 64 4 127 (513,192,384) 0.506746 0.496344 0.0104008 0.215548 360.352
1024 256 4 255 (513,768,1536) 0.523838 0.513385 0.0104525 0.175344 1490.03
4096 1024 4 511 (513,768,1536) 0.744473 0.733799 0.010673 0.386788 8470.44


Elapsed time for the weak scaling in the horizontal resolution. The results with 4 OpenMP threads are shown. An ideal scaling for Legendre transform is plotted by dotted line.

# of Cores # of Processes # of SMP $l_{max}$ $(N_{r},N_{\theta},N_{\phi})$ Elapsed Nonlinear Solver Comm. SUs
128 32 4 255 (33,384,768) 0.253204 0.248189 0.00501417 0.100608 90.0281
256 64 4 255 (65,384,768) 0.261203 0.256194 0.00500897 0.0883585 185.744
512 128 4 255 (129,384,768) 0.266168 0.261061 0.00510643 0.0962178 378.549
1024 256 4 255 (257,384,768) 0.303394 0.298234 0.00515566 0.145423 760.043
2048 512 4 255 (513,384,768) 0.276864 0.271508 0.00535592 0.118606 1575.05
4096 1024 4 255 (1025,384,768) 0.279425 0.27406 0.00536459 0.127257 3179.23


Elapsed time for the weak scaling in the radial resolution. The results with 4 OpenMP threads are shown.

Back to performance benchmark lists

files

wg/dynamo/performance_results/rayleigh.txt · Last modified: 2018/11/28 21:54 (external edit)