[[wg:dynamo:Performance_results|Back to performance benchmark lists]] \\ ===== Compile options ===== F90OPTFLAGS = -O3 -warn all -g -xhost -openmp ===== Definition of columns ===== ^ name ^ ^ | # of Cores | Number of used CPU cores | | # of Processes | Number of MPI processes | | # of Threads | Number of threads for each process | | $l_{max}$ | Truncation lavel for spherical harmonincs | | $(N_{r},N_{\theta},N_{\phi})$ | Nuber of grids in spherical coordinate | | Elapsed | Elapsed (wall clock time) for one time step | | Nonlinear | Elapsed (wall clock time) for nonlinear terms (including communications) | | Solver | Elapsed (wall clock time) for linear calculation | | Comm. | Elapsed (wall clock time) for data communication | | Efficiency | Parallel efficiency | | SUs | Service unit for $10^{4}$ time steps (Core hours) | ===== Single Processor Result ===== ^ $l_{max}$ ^ $(N_{r},N_{\theta},N_{\phi})$ ^ Elapsed ^ Nonlinear ^ Solver ^ Comm. ^ SU ^ | 47 | ( 73,72,144) | 1.604797 | 1.56274 | 0.042059 | 0.508469 | 71.3243 | ===== Strong Scaling Results ===== ^ $l_{max}$ ^ $(N_{r},N_{\theta},N_{\phi})$ ^ | 255 | (513,384,768) | ^ # of Cores ^ # of Processes ^ # of SMP ^ Elapsed ^ Nonlinear ^ Solver ^ Comm. ^ Efficiency ^ SUs ^ | 256 | 32 | 8 | 2.30595 | 2.25802 | 0.0409069 | 0.632535 | 1 | 1639.78 | | 256 | 64 | 4 | | | | | | | | 256 | 128 | 2 | | | | | | | | 512 | 64 | 8 | 1.09925 | 1.07626 | 0.0229863 | 0.294752 | 0.0104887 | 1563.38 | | 512 | 128 | 4 | 1.08382 | 1.06144 | 0.0223762 | 0.352703 | 0.010638 | 1541.44 | | 512 | 256 | 2 | 0.950071 | 0.928156 | 0.0219641 | 0.217071 | 0.0121356 | 1351.21 | | 1024 | 128 | 8 | 0.574765 | 0.563144 | 0.0116204 | 0.180746 | 0.00651564 | 1634.89 | | 1024 | 256 | 4 | 0.523838 | 0.513385 | 0.0104525 | 0.175344 | 0.011005 | 1490.03 | | 1024 | 512 | 2 | 0.515151 | 0.504822 | 0.0103284 | 0.162658 | 0.0111906 | 1465.32 | | 2048 | 256 | 8 | 0.295707 | 0.290006 | 0.00570012 | 0.112858 | 0.00974759 | 1682.25 | | 2048 | 512 | 4 | 0.276864 | 0.271508 | 0.00535592 | 0.118606 | 0.010411 | 1575.05 | | 2048 | 1024 | 2 | 0.261681 | 0.256766 | 0.00491518 | 0.103864 | 0.011015 | 1488.68 | | 4096 | 512 | 8 | 0.15379 | 0.150815 | 0.00297473 | 0.187843 | 0.00937133 | 1749.79 | | 4096 | 1024 | 4 | 0.154584 | 0.151774 | 0.00280978 | 0.0897712 | 0.00932318 | 1758.83 | {{wg:dynamo:Performance_results:Calypso_Latest:Calypso_dev_Elapsed.png?480}}\\ Elapsed (wall clock) time for the strong scaling. Number of OpenMP threads are shown by the numbers. Ideal scaling is plotted by dotted line. {{wg:dynamo:Performance_results:Calypso_Latest:Calypso_dev_efficiency.png?480}}\\ Parallel Efficiency for the strong scaling. Number of OpenMP threads are shown by the numbers. ===== Weak Scaling Results ===== ^ # of Cores ^ # of Processes ^ # of SMP ^ $l_{max}$ ^ $(N_{r},N_{\theta},N_{\phi})$ ^ Elapsed ^ Nonlinear ^ Solver ^ Comm. ^ SUs ^ | 16 | 4 | 4 | 31 | (513,48,96) | 0.345327 | 0.334976 | 0.0103503 | 0.0346996 | 15.3479 | | 64 | 16 | 4 | 63 | (513,96,192) | 0.377983 | 0.367511 | 0.0104712 | 0.0701478 | 67.197 | | 256 | 64 | 4 | 127 | (513,192,384) | 0.506746 | 0.496344 | 0.0104008 | 0.215548 | 360.352 | | 1024 | 256 | 4 | 255 | (513,768,1536) | 0.523838 | 0.513385 | 0.0104525 | 0.175344 | 1490.03 | | 4096 | 1024 | 4 | 511 | (513,768,1536) | 0.744473 | 0.733799 | 0.010673 | 0.386788 | 8470.44 | {{wg:dynamo:Performance_results:Calypso_Latest:Calypso_dev_weak_sph.png?480}}\\ Elapsed time for the weak scaling in the horizontal resolution. The results with 4 OpenMP threads are shown. An ideal scaling for Legendre transform is plotted by dotted line. \\ ^ # of Cores ^ # of Processes ^ # of SMP ^ $l_{max}$ ^ $(N_{r},N_{\theta},N_{\phi})$ ^ Elapsed ^ Nonlinear ^ Solver ^ Comm. ^ SUs ^ | 128 | 32 | 4 | 255 | (33,384,768) | 0.253204 | 0.248189 | 0.00501417 | 0.100608 | 90.0281 | | 256 | 64 | 4 | 255 | (65,384,768) | 0.261203 | 0.256194 | 0.00500897 | 0.0883585 | 185.744 | | 512 | 128 | 4 | 255 | (129,384,768) | 0.266168 | 0.261061 | 0.00510643 | 0.0962178 | 378.549 | | 1024 | 256 | 4 | 255 | (257,384,768) | 0.303394 | 0.298234 | 0.00515566 | 0.145423 | 760.043 | | 2048 | 512 | 4 | 255 | (513,384,768) | 0.276864 | 0.271508 | 0.00535592 | 0.118606 | 1575.05 | | 4096 | 1024 | 4 | 255 | (1025,384,768) | 0.279425 | 0.27406 | 0.00536459 | 0.127257 | 3179.23 | {{wg:dynamo:Performance_results:Calypso_Latest:Calypso_dev_weak_r.png?480}}\\ Elapsed time for the weak scaling in the radial resolution. The results with 4 OpenMP threads are shown.\\ [[wg:dynamo:Performance_results|Back to performance benchmark lists]] \\ [[wg:dynamo:Performance_results:Calypso_Latest:files|files]]