|
|
When PME is used with domain decomposition, separate ranks can be assigned to do only the PME mesh calculation; this is computationally more efficient starting at about 12 ranks, or even fewer when OpenMP parallelization is used. The number of PME ranks is set with option -npme, but this cannot be more than half of the ranks. By default mdrun makes a guess for the number of PME ranks when the number of ranks is larger than 16. With GPUs, using separate PME ranks is not selected automatically, since the optimal setup depends very much on the details of the hardware. In all cases, you might gain performance by optimizing -npme. Performance statistics on this issue are written at the end of the log file. For good load balancing at high parallelization, the PME grid x and y dimensions should be divisible by the number of PME ranks (the simulation will run correctly also when this is not the case).
截图是这段 |
|