| 查看: 4090 | 回复: 1 | ||
[求助]
qsub提交并行siesta不成功,求助
|
|
大家好,交流下集群程序使用遇到的问题,多谢。 [node21:10714] *** An error occurred in MPI_Comm_rank [node21:10714] *** on communicator MPI_COMM_WORLD [node21:10714] *** MPI_ERR_COMM: invalid communicator [node21:10714] *** MPI_ERRORS_ARE_FATAL: your MPI job will now abort -------------------------------------------------------------------------- mpirun has exited due to process rank 3 with PID 10711 on node node21 exiting improperly. There are two reasons this could occur: 1. this process did not call "init" before exiting, but others in the job did. This can cause a job to hang indefinitely while it waits for all processes to call "init". By rule, if one process calls "init", then ALL processes must call "init" prior to termination. 2. this process called "init", but exited without calling "finalize". By rule, all processes that call "init" MUST call "finalize" prior to exiting or it will be considered an "abnormal termination" This may have caused other processes in the application to be terminated by signals sent by mpirun (as reported here). -------------------------------------------------------------------------- [node21:10707] 7 more processes have sent help message help-mpi-errors.txt / mpi_errors_are_fatal [node21:10707] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages 新编译的计算程序siesta,用qsub job提交上去很快结束提示的信息,能否帮忙诊断一下情况。在另外一个集群上编译后直接用mpirun -np 4 siesta可以顺利执行的,不知道为何在新集群用qsub出现这个问题,这个新集群不让进入到子节点,所以必须要解决这个问题才行,多谢了。 不知道是哪里的问题,之前在该环境并行编译的lammps和vasp都使用很顺利,就是siesta用qsub提交作业总是无法正常计算,但是并行编译的siesta在另外环境下的子节点用mpirun -np 4 siesta执行很顺利,纠结了。 哦,登录节点上mpirun我试过的,请帮忙看看,感觉被管理员设置了也无法用: mpirun -np 4 siesta libibverbs: Warning: RLIMIT_MEMLOCK is 32768 bytes. This will severely limit memory registrations. libibverbs: Warning: RLIMIT_MEMLOCK is 32768 bytes. This will severely limit memory registrations. libibverbs: Warning: RLIMIT_MEMLOCK is 32768 bytes. This will severely limit memory registrations. libibverbs: Warning: RLIMIT_MEMLOCK is 32768 bytes. This will severely limit memory registrations. -------------------------------------------------------------------------- The OpenFabrics (openib) BTL failed to initialize while trying to allocate some locked memory. This typically can indicate that the memlock limits are set too low. For most HPC installations, the memlock limits should be set to "unlimited". The failure occured here: Local host: manage1 OMPI source: btl_openib_component.c:1115 Function: ompi_free_list_init_ex_new() Device: mlx4_0 Memlock limit: 32768 You may need to consult with your system administrator to get this problem fixed. This FAQ entry on the Open MPI web site may also be helpful: http://www.open-mpi.org/faq/?category=openfabrics#ib-locked-pages -------------------------------------------------------------------------- -------------------------------------------------------------------------- WARNING: There was an error initializing an OpenFabrics device. Local host: manage1 Local device: mlx4_0 -------------------------------------------------------------------------- [manage1:16214] *** An error occurred in MPI_Comm_rank [manage1:16214] *** on communicator MPI_COMM_WORLD [manage1:16214] *** MPI_ERR_COMM: invalid communicator [manage1:16214] *** MPI_ERRORS_ARE_FATAL: your MPI job will now abort -------------------------------------------------------------------------- mpirun has exited due to process rank 0 with PID 16212 on node manage1 exiting improperly. There are two reasons this could occur: 1. this process did not call "init" before exiting, but others in the job did. This can cause a job to hang indefinitely while it waits for all processes to call "init". By rule, if one process calls "init", then ALL processes must call "init" prior to termination. 2. this process called "init", but exited without calling "finalize". By rule, all processes that call "init" MUST call "finalize" prior to exiting or it will be considered an "abnormal termination" This may have caused other processes in the application to be terminated by signals sent by mpirun (as reported here). -------------------------------------------------------------------------- [manage1:16211] 3 more processes have sent help message help-mpi-btl-openib.txt / init-fail-no-mem [manage1:16211] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages [manage1:16211] 3 more processes have sent help message help-mpi-btl-openib.txt / error in device init [manage1:16211] 3 more processes have sent help message help-mpi-errors.txt / mpi_errors_are_fatal 计算子节点无法进入,被限制死了的。 这边用的pbs作业管理gridview,我用的提交脚本是: ==================================== #PBS -N test #PBS -l nodes=1:ppn=8 #PBS -j oe #PBS -l walltime=24:00:00 cd $PBS_O_WORKDIR NP=`cat $PBS_NODEFILE|wc -l` source /public/software/mpi/openmpi1.5.4-intel.sh mpirun -machinefile $PBS_NODEFILE -np $NP \ /home/sw/siesta/siesta-3.1/Obj/siesta < fe.fdf | tee output ===================================== 感谢虫友帮忙,多谢。 |
» 猜你喜欢
寻求一种能扛住强氧化性腐蚀性的容器密封件
已经有5人回复
真诚求助:手里的省社科项目结项要求主持人一篇中文核心,有什么渠道能发核心吗
已经有7人回复
论文投稿,期刊推荐
已经有6人回复
请问哪里可以有青B申请的本子可以借鉴一下。
已经有4人回复
孩子确诊有中度注意力缺陷
已经有14人回复
请问下大家为什么这个铃木偶联几乎不反应呢
已经有5人回复
请问有评职称,把科研教学业绩算分排序的高校吗
已经有5人回复
2025冷门绝学什么时候出结果
已经有3人回复
天津工业大学郑柳春团队欢迎化学化工、高分子化学或有机合成方向的博士生和硕士生加入
已经有4人回复
康复大学泰山学者周祺惠团队招收博士研究生
已经有6人回复
» 本主题相关价值贴推荐,对您同样有帮助:
【求助】siesta 安装出错
已经有18人回复
【求助】求助siesta的Ordern方法不能收敛的问题
已经有4人回复
【求助】siesta3.0并行成功安装后不能并行的问题
已经有15人回复

redsnowolf
银虫 (小有名气)
- 应助: 7 (幼儿园)
- 金币: 416.5
- 散金: 36
- 红花: 5
- 帖子: 213
- 在线: 689.8小时
- 虫号: 1332218
- 注册: 2011-06-27
- 性别: GG
- 专业: 半导体材料
【答案】应助回帖
★
liliangfang: 金币+1, 谢谢交流 2012-09-15 15:12:35
liliangfang: 金币+1, 谢谢交流 2012-09-15 15:12:35
|
我前两天用vasp也出现类似问题,刚刚解决~ The OpenFabrics (openib) BTL failed to initialize while trying to allocate some locked memory. This typically can indicate that the memlock limits are set too low. For most HPC installations, the memlock limits should be set to "unlimited". The failure occured here: Local host: node21 OMPI source: btl_openib_component.c:1055 Function: ompi_free_list_init_ex_new() Device: mlx4_0 Memlock limit: 65536 You may need to consult with your system administrator to get this problem fixed. This FAQ entry on the Open MPI web site may also be helpful: http://www.open-mpi.org/faq/?category=openfabrics#ib-locked-pages -------------------------------------------------------------------------- -------------------------------------------------------------------------- WARNING: There was an error initializing an OpenFabrics device. 上面那个网址里15、16、17说的挺清楚的,我的情况是在每个节点ulimit -a显示locked memory都正常,可就是出错说内存分配不正常,那个网址里说可能是登录时没有正常执行系统所设的locked memory,或者作业调度系统没有分配给应用程序足够大的内存,最后重启了一下每个节点的pbs调度系统的守护进程,问题解决了~ 或者你可以在mpirun前边儿加上ulimit -l unlimited,用qsub提交下试试 希望以上信息对楼主有用~ |
2楼2012-09-15 14:24:17













回复此楼