[[wg:dynamo:Performance_results|Back to performance benchmark lists]] \\ ===== compile options ===== F90OPTFLAGS = -O3 -warn all -g -xhost -openmp ===== Definition of columns ===== ^ name ^ ^ | # of Cores | Number of used CPU cores | | # of Processes | Number of MPI processes | | # of Threads | Number of threads for each process | | $l_{max}$ | Truncation lavel for spherical harmonincs | | $(N_{r},N_{\theta},N_{\phi})$ | Nuber of grids in spherical coordinate | | Elapsed | Elapsed (wall clock time) for one time step | | Nonlinear | Elapsed (wall clock time) for nonlinear terms (including communications) | | Solver | Elapsed (wall clock time) for linear calculation | | Comm. | Elapsed (wall clock time) for data communication | | Efficiency | Parallel efficiency | | SUs | Service unit for $10^{4}$ time steps (Core hours) | ===== Single Processor Result ===== ^ $l_{max}$ ^ $(N_{r},N_{\theta},N_{\phi})$ ^ Elapsed ^ Nonlinear ^ Solver ^ Comm. ^ SU ^ | 47 | ( 73,72,144) | 1.604797 | 1.56274 | 0.042059 | 0.508469 | 71.3243 | ===== Strong Scaling Results ===== ^ $l_{max}$ ^ $(N_{r},N_{\theta},N_{\phi})$ ^ | 255 | (513,384,768) | ^ # of Cores ^ # of Processes ^ # of SMP ^ Elapsed ^ Nonlinear ^ Solver ^ Comm. ^ Efficiency ^ SUs ^ | 256 | 32 | 8 | 7.25365 | 7.20543 | 0.0482177 | 1.09003 | 0.940665 | 5158.15 | | 256 | 64 | 4 | 6.82326 | 6.7743 | 0.0489558 | 0.794641 | 1 | 4852.09 | | 256 | 128 | 2 | 6.74711 | 6.69947 | 0.0476359 | 0.72289 | 1.01129 | 4797.94 | | 512 | 64 | 8 | 3.57915 | 3.5559 | 0.0232448 | 0.541038 | 0.953195 | 5090.34 | | 512 | 128 | 4 | 2.13608 | 2.11468 | 0.0214005 | 0.481005 | 1.59714 | 3037.98 | | 512 | 256 | 2 | 2.11996 | 2.09772 | 0.0222343 | 0.430132 | 1.60929 | 3015.06 | | 1024 | 128 | 8 | 1.76799 | 1.75701 | 0.010974 | 0.300582 | 0.964834 | 5028.94 | | 1024 | 256 | 4 | 1.33388 | 1.32335 | 0.01052 | 0.514778 | 1.27884 | 3794.14 | | 1024 | 512 | 2 | 2.31397 | 2.30589 | 0.00807709 | 1.36341 | 0.73718 | 6581.96 | | 2048 | 256 | 8 | 0.838836 | 0.833227 | 0.00560799 | 0.195589 | 1.01677 | 4772.05 | | 2048 | 512 | 4 | 0.81061 | 0.805883 | 0.0047257 | 0.168418 | 1.05218 | 4611.47 | | 2048 | 1024 | 2 | 0.672909 | 0.668731 | 0.00417642 | 0.162079 | 1.26749 | 3828.1 | | 4096 | 512 | 8 | 0.386554 | 0.383808 | 0.00274551 | 0.101968 | 1.10322 | 4398.13 | | 4096 | 1024 | 4 | 0.3577 | 0.355273 | 0.00242621 | 0.105497 | 1.19221 | 4069.83 | {{wg:dynamo:Performance_results:Calypso:Calypso11_Elapsed.png?480}}\\ Elapsed (wall clock) time for the strong scaling. Number of OpenMP threads are shown by the numbers. Ideal scaling is plotted by dotted line. {{wg:dynamo:Performance_results:Calypso:Calypso11_efficiency.png?480}}\\ Parallel Efficiency for the strong scaling. Number of OpenMP threads are shown by the numbers. ===== Weak Scaling Results ===== ^ # of Cores ^ # of Processes ^ # of SMP ^ $l_{max}$ ^ $(N_{r},N_{\theta},N_{\phi})$ ^ Elapsed ^ Nonlinear ^ Solver ^ Comm. ^ SUs ^ | 16 | 4 | 4 | 31 | (513,48,96) | 0.500935 | 0.489675 | 0.0112569 | 0.129805 | 22.2638 | | 64 | 16 | 4 | 63 | (513,96,192) | 0.60037 | 0.591241 | 0.00912827 | 0.21034 | 106.733 | | 256 | 64 | 4 | 127 | (513,192,384) | 0.730134 | 0.72047 | 0.00966272 | 0.203059 | 519.207 | | 1024 | 256 | 4 | 255 | (513,384,768) | 1.33388 | 1.32335 | 0.01052 | 0.514778 | 3794.14 | | 4096 | 1024 | 4 | 511 | (513,768,1536) | 1.94168 | 1.93247 | 0.00921192 | 0.57891 | 22092 | {{wg:dynamo:Performance_results:Calypso:Calypso11_weak_sph.png?480}}\\ Elapsed time for the weak scaling in the horizontal resolution. The results with 4 OpenMP threads are shown. \\ ^ # of Cores ^ # of Processes ^ # of SMP ^ $l_{max}$ ^ $(N_{r},N_{\theta},N_{\phi})$ ^ Elapsed ^ Nonlinear ^ Solver ^ Comm. ^ SUs ^ | 128 | 32 | 4 | 255 | (33,384,768) | 0.536611 | 0.53181 | 0.00479917 | 0.718157 | 190.795 | | 256 | 64 | 4 | 255 | (65,384,768) | 0.69748 | 0.69268 | 0.00479858 | 0.171752 | 495.986 | | 512 | 128 | 4 | 255 | (129,384,768) | 0.694585 | 0.0689725 | 0.00444602 | 0.720039 | 987.854 | | 1024 | 256 | 4 | 255 | (257,384,768) | 0.809201 | 0.804243 | 0.0049558 | 0.203059 | 2301.73 | | 2048 | 512 | 4 | 255 | (513,384,768) | 0.81061 | 0.805883 | 0.0047257 | 0.168418 | 4611.47 | | 4096 | 1024 | 4 | 255 | (1025,384,768) | 0.809201 | 0.804471 | 0.00472797 | 0.17441 | 9206.91 | {{wg:dynamo:Performance_results:Calypso:Calypso11_weak_r.png?480}}\\ Elapsed time for the weak scaling in the radial resolution. The results with 4 OpenMP threads are shown. [[wg:dynamo:Performance_results|Back to performance benchmark lists]] \\ [[wg:dynamo:Performance_results:Calypso:files|files]]