[[wg:dynamo:Performance_results|Back to performance benchmark lists]] \\ ===== Compile options ===== F90FLAGS = -FR -fpp -r8 -O3 -xAVX -shared_intel -I$(MKLROOT)/include -I$(MKLROOT)/include/fftw ===== Definition of columns ===== ^ name ^ ^ | # of Cores | Number of used CPU cores | | # of Processes | Number of MPI processes | | # of Threads | Number of threads for each process | | $l_{max}$ | Truncation lavel for spherical harmonincs | | $N_{C}$ | Truncation lavel for Chebyshev polynomials | | $(N_{r},N_{\theta},N_{\phi})$ | Nuber of grids in spherical coordinate | | Elapsed | Elapsed (wall clock time) for one time step | | Nonlinear | Elapsed (wall clock time) for nonlinear terms (including communications) | | Solver | Elapsed (wall clock time) for linear calculation | | Comm. | Elapsed (wall clock time) for data communication | | Efficiency | Parallel efficiency | | SUs | Service unit for $10^{4}$ time steps (Core hours) | ===== Single Processor Result ===== ^ $N_{C}$ ^ $l_{max}$ ^ $(N_{r},N_{\theta},N_{\phi})$ ^ Elapsed ^ Nonlinear ^ Solver ^ Comm. ^ SU ^ | 48 | 47 | ( 73,72,144) | 1.604797 | 1.56274 | 0.042059 | 0.508469 | 71.3243 | ===== Strong Scaling Results ===== ^ $N_{C}$ ^ $l_{max}$ ^ $(N_{r},N_{\theta},N_{\phi})$ ^ | 255 | (512,384,768) | ^ # of Cores ^ # of Processes ^ # of SMP ^ Elapsed ^ Nonlinear ^ Solver ^ Comm. ^ Efficiency ^ SUs ^ | 64 | 64 | 1 | 5.9551 | 2.97054 | 0.77414 | 2.21042 | 1 | 1058.68 | | 128 | 128 | 1 | 3.12294 | 1.67361 | 0.436457 | 1.01287 | 0.953444 | 1110.38 | | 256 | 256 | 1 | 1.49993 | 0.793621 | 0.224334 | 0.481976 | 0.992562 | 1066.62 | | 512 | 512 | 1 | 0.901664 | 0.441342 | 0.145236 | 0.315087 | 0.82557 | 1282.37 | | 1024 | 1024 | 1 | 0.457387 | 0.219177 | 0.0752801 | 0.16293 | 0.813738 | 1301.01 | | 2048 | 2048 | 1 | 0.322376 | 0.136652 | 0.0529017 | 0.132822 | 0.577267 | 1833.96 | | 4096 | 4096 | 1 | 0.185785 | 0.0691138 | 0.0287718 | 0.0878993 | 0.500839 | 2113.82 | {{wg:dynamo:Performance_results:Rayleigh:Rayleigh_Elapsed.png?480}}\\ Elapsed (wall clock) time for the strong scaling. Number of OpenMP threads are shown by the numbers. Ideal scaling is plotted by dotted line. {{wg:dynamo:Performance_results:Rayleigh:Rayleigh_efficiency.png?480}}\\ Parallel Efficiency for the strong scaling. Number of OpenMP threads are shown by the numbers. ===== Weak Scaling Results ===== ^ # of Cores ^ # of Processes ^ # of SMP ^ $l_{max}$ ^ $(N_{r},N_{\theta},N_{\phi})$ ^ Elapsed ^ Nonlinear ^ Solver ^ Comm. ^ SUs ^ | 16 | 4 | 4 | 31 | (513,48,96) | 0.345327 | 0.334976 | 0.0103503 | 0.0346996 | 15.3479 | | 64 | 16 | 4 | 63 | (513,96,192) | 0.377983 | 0.367511 | 0.0104712 | 0.0701478 | 67.197 | | 256 | 64 | 4 | 127 | (513,192,384) | 0.506746 | 0.496344 | 0.0104008 | 0.215548 | 360.352 | | 1024 | 256 | 4 | 255 | (513,768,1536) | 0.523838 | 0.513385 | 0.0104525 | 0.175344 | 1490.03 | | 4096 | 1024 | 4 | 511 | (513,768,1536) | 0.744473 | 0.733799 | 0.010673 | 0.386788 | 8470.44 | {{wg:dynamo:Performance_results:Rayleigh:Rayleigh_weak_sph.png?480}}\\ Elapsed time for the weak scaling in the horizontal resolution. The results with 4 OpenMP threads are shown. An ideal scaling for Legendre transform is plotted by dotted line. \\ ^ # of Cores ^ # of Processes ^ # of SMP ^ $l_{max}$ ^ $(N_{r},N_{\theta},N_{\phi})$ ^ Elapsed ^ Nonlinear ^ Solver ^ Comm. ^ SUs ^ | 128 | 32 | 4 | 255 | (33,384,768) | 0.253204 | 0.248189 | 0.00501417 | 0.100608 | 90.0281 | | 256 | 64 | 4 | 255 | (65,384,768) | 0.261203 | 0.256194 | 0.00500897 | 0.0883585 | 185.744 | | 512 | 128 | 4 | 255 | (129,384,768) | 0.266168 | 0.261061 | 0.00510643 | 0.0962178 | 378.549 | | 1024 | 256 | 4 | 255 | (257,384,768) | 0.303394 | 0.298234 | 0.00515566 | 0.145423 | 760.043 | | 2048 | 512 | 4 | 255 | (513,384,768) | 0.276864 | 0.271508 | 0.00535592 | 0.118606 | 1575.05 | | 4096 | 1024 | 4 | 255 | (1025,384,768) | 0.279425 | 0.27406 | 0.00536459 | 0.127257 | 3179.23 | {{wg:dynamo:Performance_results:Rayleigh:Rayleigh_weak_r.png?480}}\\ Elapsed time for the weak scaling in the radial resolution. The results with 4 OpenMP threads are shown.\\ [[wg:dynamo:Performance_results|Back to performance benchmark lists]] \\ [[wg:dynamo:Performance_results:Rayleigh:files|files]]