[[wg:dynamo:Performance_results|Back to performance benchmark lists]] \\ ===== compile options ===== F90OPTFLAGS = -O3 -warn all -g -xhost -openmp ===== Definition of columns ===== ^ name ^ ^ | # of Cores | Number of used CPU cores | | # of Processes | Number of MPI processes | | # of Threads | Number of threads for each process | | $N_{c}$ | Truncation lavel for Chebyshev polynomial | | $l_{max}$ | Truncation lavel for spherical harmonincs | | $(N_{r},N_{\theta},N_{\phi})$ | Nuber of grids in spherical coordinate | | Elapsed time | Elapsed (wall clock time) for one time step | | Efficiency | Parallel efficiency | | SUs | Service unit for $10^{4}$ time steps (Core hours) | ===== Single Processor Result ===== ^ $N_{c}$ ^ $l_{max}$ ^ $(N_{r},N_{\theta},N_{\phi})$ ^ Elapsed time ^ | 47 | 48 | (72,72,144) | 0.35723 | ===== Strong Scaling Results ===== In the present test, spatial resolution is fixed, and change the parallelization. Elapsed time is inverse proportion to the number of Cores in ideal scaling. ^ $N_{c}$ ^ $l_{max}$ ^ $(N_{r},N_{\theta},N_{\phi})$ ^ | 47 | 48 | (72,72,144) | ^ # of Cores ^ # of Processes ^ # of Threads ^ Elapsed time ^ Efficiency ^ SUs ^ | 1 | 1 | 1 | 0.35723 | 1.0 | 15.8769 | | 2 | 1 | 2 | 0.18729 | 0.953681 | 8.32400 | | 4 | 1 | 4 | 0.10288 | 0.868074 | 4.57244 | | 8 | 1 | 8 | 0.05784 | 0.772022 | 2.57067 | ^ $N_{c}$ ^ $l_{max}$ ^ $(N_{r},N_{\theta},N_{\phi})$ ^ | 97 | 170 | (145,256,512) | ^ # of Cores ^ # of Processes ^ # of Threads ^ Elapsed time ^ Efficiency ^ SUs ^ | 1 | 1 | 1 | 18.3200 | 1.0 | 814.222 | | 2 | 1 | 2 | 8.69336 | 1.05368 | 386.372 | | 4 | 1 | 4 | 4.51397 | 1.01463 | 200.621 | | 8 | 1 | 8 | 2.53104 | 0.904766 | 112.491 | | 16 | 1 | 16 | 2.16658 | 0.528483 | 96.2924 | {{wg:dynamo:Performance_results:magic:MagIC_scaling.png?480}}\\ Elapsed (wall clock) time for the strong scaling for $(N_{c},l_{max}) = (97,170)$ case. Number of OpenMP threads are shown by the numbers. Ideal scaling is plotted by dotted line. {{wg:dynamo:Performance_results:magic:MagIC_efficiency.png?480}}\\ Parallel Efficiency for the strong scaling for $(N_{c},l_{max}) = (97,170)$ case. Number of OpenMP threads are shown by the numbers. ===== Weak Scaling Results ===== === Weak Scaling in horizontal direction === In the present benchmark, radial resolution is fixed, and horizontal resolution is increased with parallelization. ^ # of Cores ^ # of Processes ^ # of Threads ^ $N_{c}$ ^ $l_{max}$ ^ $(N_{r},N_{\theta},N_{\phi})$ ^ Elapsed time ^ SUs ^ | 1 | 1 | 1 | 97 | 42 | (145,64,128) | 0.61665 | 27.4067 | | 4 | 1 | 4 | 97 | 84 | (145,128,256) | 0.623898 | 27.564 | | 16 | 1 | 16 | 97 | 170 | (145,256,512) | 2.16658 | 96.2924 | {{wg:dynamo:Performance_results:magic:MagIC_weak_sph.png?480}}\\ Elapsed (wall clock) time for the weak scaling in the horizontal resolutions. Number of OpenMP threads are shown by the numbers. === Weak Scaling in radial direction === In the present benchmark, horizontal resolution is fixed, and radial resolution is increased with parallelization. ^ # of Cores ^ # of Processes ^ # of Threads ^ $N_{c}$ ^ $l_{max}$ ^ $(N_{r},N_{\theta},N_{\phi})$ ^ Elapsed time ^ SUs ^ | 1 | 1 | 1 | 9 | 170 | (8,256,512) | 0.65988 | 29.328 | | 2 | 1 | 2 | 17 | 170 | (16,256,512) | 0.74609 | 33.1596 | | 4 | 1 | 4 | 33 | 170 | (32,256,512) | 0.79330 | 35.2578 | | 8 | 1 | 8 | 63 | 170 | (64,256,512) | 1.01657 | 45.1809 | | 16 | 1 | 16 | 129 | 170 | (128,256,512) | 1.85728 | 82.5458 | {{wg:dynamo:Performance_results:magic:MagIC_weak_r.png?480}} \\ Elapsed time for the weak scaling in the radial resolution. Number of OpenMP threads are shown by the numbers. [[wg:dynamo:Performance_results|Back to performance benchmark lists]] \\