[[wg:dynamo:Performance_results|Back to performance benchmark lists]] \\ ===== compile options ===== F90OPTFLAGS = -r8 -i4 -ftz -IPF_fma -IPF_fltacc -WB -O3 -xhosts ===== Definition of columns ===== ^ name ^ ^ | # of Cores | Number of used CPU cores | | # of Processes | Number of MPI processes | | # of Threads | Number of threads for each process | | $N_{c}$ | Truncation lavel for Chebyshev polynomials | | $l_{max}$ | Truncation lavel for spherical harmonincs | | $(N_{r},N_{\theta},N_{\phi})$ | Nuber of grids in spherical coordinate | | Elapsed | Elapsed (wall clock time) for one time step | | Legendre | Elapsed (wall clock time) for Legendre transform | | Implicit | Elapsed (wall clock time) for linear calculation | | Efficiency | Parallel efficiency | | SUs | Service unit for $10^{4}$ time steps (Core hours) | ===== Single Processor Result ===== ^ $N_{c}$ ^ $l_{max}$ ^ $(N_{r},N_{\theta},N_{\phi})$ ^ Elapsed ^ Legendre ^ Implicit ^ LUdecomp ^ SUs ^ | 71 | 47 | (73,72,144) | 0.96659 | 0.57970 | 0.13313 | 0.010014 | 2.6849 | ===== Strong Scaling Results ===== ^ $N_{c}$ ^ $l_{max}$ ^ $(N_{r},N_{\theta},N_{\phi})$ ^ | 191 | 255 | (193,384,768) | ^ # of Cores ^ # of Processes ^ # of SMP ^ Elapsed ^ Legendre ^ Implicit ^ Efficiency ^ SUs ^ | 16 | 16 | 1 | 7.8559 | 3.1132 | 1.0993 | 1.0 | 349.151 | | 32 | 32 | 1 | 4.4581 | 1.5484 | 0.67073 | 0.881082 | 396.276 | | 64 | 64 | 1 | 3.4032 | 0.77098 | 0.68621 | 0.577097 | 605.013 | | 128 | 128 | 1 | 1.0696 | 0.37643 | 0.14921 | 0.918089 | 380.302 | ^ $N_{c}$ ^ $l_{max}$ ^ $(N_{r},N_{\theta},N_{\phi})$ ^ | 255 | 511 | (257,768,1536) | ^ # of Cores ^ # of Processes ^ # of SMP ^ Elapsed ^ Legendre ^ Implicit ^ Efficiency ^ SUs ^ | 64 | 64 | 1 | 13.018 | 4.7327 | 1.9132 | 1.0 | 414.015 | | 128 | 128 | 1 | 8.7973 | 2.3534 | 1.7398 | 0.555322 | 745.541 | | 256 | 256 | 1 | 8.678 | 1.1378 | 4.3325 | 0.412058 | 1004.75 | {{wg:dynamo:Performance_results:Gary_Latest:ucsc2_elapsed.png?480}}\\ Elapsed (wall clock) time for the strong scaling. Number of OpenMP threads are shown by the numbers. Ideal scaling is plotted by dotted line. {{wg:dynamo:Performance_results:Gary_Latest:ucsc2_efficiency.png?480}}\\ Parallel Efficiency for the strong scaling. ===== Weak Scaling Results ===== ^ # of Cores ^ # of Processes ^ # of SMP ^ $N_{c}$ ^ $l_{max}$ ^ $(N_{r},N_{\theta},N_{\phi})$ ^ Elapsed ^ Legendre ^ Implicit ^ SUs ^ | 2 | 2 | 1 | 255 | 63 | (257,96,192) | 3.3257 | 1.8103 | 0.47442 | 147.809 | | 8 | 8 | 1 | 255 | 127 | (257,192,384) | 4.0801 | 1.8754 | 0.51211 | 181.338 | | 32 | 32 | 1 | 255 | 255 | (257,384,768) | 5.8172 | 2.0489 | 0.8497 | 517.084 | | 128 | 128 | 1 | 255 | 511 | (257,768,1536) | 9.5023 | 2.3534 | 1.7398 | 3378.6 | | | | | | | | | | | | | 3 | 3 | 1 | 255 | 63 | (257,96,192) | 2.3095 | 1.1811 | 0.3029 | 102.644 | | 9 | 9 | 1 | 255 | 127 | (257,192,384) | 3.5408 | 1.7574 | 0.46589 | 157.369 | | 33 | 33 | 1 | 255 | 255 | (257,384,768) | 5.528 | 1.7904 | 0.77925 | 737.067 | | 129 | 129 | 1 | 255 | 511 | (257,768,1536) | 8.9802 | 1.9042 | 1.7902 | 3592.1 | {{wg:dynamo:Performance_results:Gary_Latest:ucsc2_weak_sph.png?480}}\\ Elapsed time for the weak scaling in the horizontal resolution. Scaling of $O(Ncore^{1/2})$ is plotted by dotted line. ^ # of Cores ^ # of Processes ^ # of SMP ^ $N_{c}$ ^ $l_{max}$ ^ $(N_{r},N_{\theta},N_{\phi})$ ^ Elapsed ^ Legendre ^ Implicit ^ SUs ^ | 16 | 16 | 1 | 31 | 511 | (33,768,1536) | 5.1154 | 2.5566 | 0.49331 | 227.351 | | 32 | 32 | 1 | 63 | 511 | (65,768,1536) | 6.0620 | 2.4221 | 0.76595 | 538.844 | | 64 | 64 | 1 | 127 | 511 | (129,768,1536) | 6.9065 | 2.3967 | 1.1075 | 1227.82 | | 128 | 128 | 1 | 255 | 511 | (257,768,1536) | 9.5023 | 2.3534 | 1.7398 | 3378.6 | | | | | | | | | | | | | 17 | 17 | 1 | 31 | 511 | (33,768,1536) | 4.2943 | 2.4266 | 0.40772 | 381.716 | | 33 | 33 | 1 | 63 | 511 | (65,768,1536) | 5.6936 | 2.2617 | 0.67252 | 759.147 | | 65 | 65 | 1 | 127 | 511 | (129,768,1536) | 6.7351 | 2.1191 | 1.0571 | 1496.69 | | 129 | 129 | 1 | 255 | 511 | (257,768,1536) | 8.9802 | 1.9042 | 1.7902 | 3592.08 | {{wg:dynamo:Performance_results:Gary_Latest:ucsc2_weak_r.png?480}}\\ Elapsed time for the weak scaling in the radial resolution. Scaling of $O(Ncore)$ is plotted by dotted line. [[wg:dynamo:Performance_results|Back to performance benchmark lists]] \\ [[wg:dynamo:Performance_results:Gary_Latest:files|files]]