User Tools

Site Tools


wg:dynamo:performance_results:titech

Back to performance benchmark lists

compile options

F90OPTFLAGS = -O3 -warn all -g -xhost -openmp

Notes

Nonlinear terms is calculated twice for each step
All process have full matrix for all harmonics degree
LU decomposition is done for full matrix
Time integration is done by a solver for banded matrix

Definition of columns

name
# of Cores Number of used CPU cores
# of Processes Number of MPI processes
# of Threads Number of threads for each process
$l_{max}$ Truncation lavel for spherical harmonincs
$(N_{r},N_{\theta},N_{\phi})$ Nuber of grids in spherical coordinate
Elapsed Elapsed (wall clock) time for one time step
Nonlinear Elapsed (wall clock) time for nonlinear terms (including communications)
Solver Elapsed (wall clock) time for linear calculation
Comm. Elapsed (wall clock) time for data communication
Init. Elapsed (wall clock) time for initialization
Efficiency Parallel efficiency
SUs Service unit for $10^{4}$ time steps (Core hours)

Single Processor Result

$l_{max}$ $(N_{r},N_{\theta},N_{\phi})$ Elapsed Nonlinear Solver Comm. Init. SU
47 ( 73,72,144) 0.678760 0.488277 0.190479 0.029903 2.4152 30.1671

Strong Scaling Results

$l_{max}$ $(N_{r},N_{\theta},N_{\phi})$
255 (256,384,768)
# of Cores # of Processes # of SMP Elapsed Nonlinear Solver Comm. Init. Efficiency SUs
64 8 8 6.40703 5.89877 0.508254 1.07863 3554.5 1 1139.03
128 16 8 3.54131 3.2940 0.247309 0.890222 3552.51 0.904612 1259.13
256 32 8 1.86101 1.7352 0.125808 0.475738 3550.63 0.860692 1323.39
1024 64 8 1.04298 0.977307 0.0656672 0.361399 3552.23 0.383937 2966.7


Elapsed (wall clock) time for the strong scaling. Ideal scaling is plotted by dotted line.


Parallel Efficiency for the strong scaling.

Weak Scaling Results

# of Cores # of Processes # of SMP $l_{max}$ $(N_{r},N_{\theta},N_{\phi})$ Elapsed Nonlinear Solver Comm. Init. SUs
2 1 2 31 (256,48,96) 0.551937 0.391925 0.160007 0.0214999 68.4944 15.3479
8 1 8 63 (256,96,192) 1.12191 0.939082 0.182825 0.0476302 92.4576 67.197
32 4 8 127 (256,192,384) 1.81109 1.62482 0.186259 0.27049 271.017 360.352
128 16 8 255 (256,384,768) 2.62543 2.43587 0.189552 0.624057 969.922 1490.03
512 64 8 511 (256,768,1536) 4.65903 4.45921 0.199815 2.01773 3545.45 8470.44


Elapsed time for the weak scaling in the horizontal resolution. Elapsed time for each time step is plotted by black, and initialization time is plotted by red. Scaling of O(Ncore^1/2) is plotted by dotted lines.

# of Cores # of Processes # of SMP $l_{max}$ $(N_{r},N_{\theta},N_{\phi})$ Elapsed Nonlinear Solver Comm. Init. SUs
96 12 8 383 (64,576,1152) 2.76226 2.59632 0.165937 0.509965 43.1944 736.604
192 24 8 383 (128,576,1152) 2.81134 2.63732 0.174011 0.613411 484.779 1499.38
384 48 8 383 (256,576,1152) 3.63671 3.44881 0.187897 1.27930 6423.87 3879.16


Elapsed time for the weak scaling in the radial resolution. The results with 4 OpenMP threads are shown.

Back to performance benchmark lists

files

wg/dynamo/performance_results/titech.txt · Last modified: 2018/11/28 21:55 (external edit)