24小时热门版块排行榜    

查看: 2036  |  回复: 1

qh203

铜虫 (小有名气)

[求助] root和普通用户下并行计算问题

在root用户下,用openmpi并行计算cpi 这个算例,6个节点,每个节点8个cpu。输出正常,如下

[root@node1 examples]# mpirun -np 40 -machinefile test ./cpi
Process 3 on node2
Process 38 on node6
Process 18 on node4
Process 32 on node6
Process 20 on node4
Process 2 on node2
Process 35 on node6
Process 34 on node6
Process 22 on node4
Process 7 on node2
Process 23 on node4
Process 5 on node2
Process 4 on node2
Process 37 on node6
Process 33 on node6
Process 30 on node5
Process 8 on node3
Process 26 on node5
Process 10 on node3
Process 15 on node3
Process 27 on node5
Process 31 on node5
Process 28 on node5
Process 24 on node5
Process 19 on node4
Process 21 on node4
Process 17 on node4
Process 6 on node2
Process 16 on node4
Process 25 on node5
Process 9 on node3
Process 11 on node3
Process 13 on node3
Process 14 on node3
Process 0 on node2
Process 1 on node2
Process 36 on node6
Process 39 on node6
Process 12 on node3
Process 29 on node5
pi is approximately 3.1416009869231245, Error is 0.0000083333333314
wall clock time = 0.128546

在普通用户下用openmpi并行计算cpi这个算例,输出则变成

[aojjj@node1 examples]$ mpirun -np 40 -machinefile test ./cpi
libibverbs: Warning: RLIMIT_MEMLOCK is 32768 bytes.
    This will severely limit memory registrations.
libibverbs: Warning: RLIMIT_MEMLOCK is 32768 bytes.
    This will severely limit memory registrations.
--------------------------------------------------------------------------
The OpenFabrics (openib) BTL failed to register memory in the driver.
Please check /var/log/messages or dmesg for driver specific failure
reason.
The failure occured here:

  Local host:    mthca0
  Device:        openib_reg_mr
  Function:      Cannot allocate memory()
  Errno says:   

You may need to consult with your system administrator to get this
problem fixed.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
The OpenFabrics (openib) BTL failed to initialize while trying to
allocate some locked memory.  This typically can indicate that the
memlock limits are set too low.  For most HPC installations, the
memlock limits should be set to "unlimited".  The failure occured
here:

  Local host:    node4
  OMPI source:   btl_openib_component.c:1161
  Function:      ompi_free_list_init_ex_new()
  Device:        mthca0
  Memlock limit: 32768

You may need to consult with your system administrator to get this
problem fixed.  This FAQ entry on the Open MPI web site may also be
helpful:

    http://www.open-mpi.org/faq/?category=openfabrics#ib-locked-pages
--------------------------------------------------------------------------
--------------------------------------------------------------------------
WARNING: There was an error initializing an OpenFabrics device.

  Local host:   node4
  Local device: mthca0
--------------------------------------------------------------------------
libibverbs: Warning: RLIMIT_MEMLOCK is 32768 bytes.
    This will severely limit memory registrations.
libibverbs: Warning: RLIMIT_MEMLOCK is 32768 bytes.
    This will severely limit memory registrations.
libibverbs: Warning: RLIMIT_MEMLOCK is 32768 bytes.
    This will severely limit memory registrations.
libibverbs: Warning: RLIMIT_MEMLOCK is 32768 bytes.
    This will severely limit memory registrations.
libibverbs: Warning: RLIMIT_MEMLOCK is 32768 bytes.
    This will severely limit memory registrations.
libibverbs: Warning: RLIMIT_MEMLOCK is 32768 bytes.
    This will severely limit memory registrations.
libibverbs: Warning: RLIMIT_MEMLOCK is 32768 bytes.
    This will severely limit memory registrations.
libibverbs: Warning: RLIMIT_MEMLOCK is 32768 bytes.
    This will severely limit memory registrations.
libibverbs: Warning: RLIMIT_MEMLOCK is 32768 bytes.
    This will severely limit memory registrations.
libibverbs: Warning: RLIMIT_MEMLOCK is 32768 bytes.
    This will severely limit memory registrations.
libibverbs: Warning: RLIMIT_MEMLOCK is 32768 bytes.
    This will severely limit memory registrations.
libibverbs: Warning: RLIMIT_MEMLOCK is 32768 bytes.
    This will severely limit memory registrations.
libibverbs: Warning: RLIMIT_MEMLOCK is 32768 bytes.
    This will severely limit memory registrations.
libibverbs: Warning: RLIMIT_MEMLOCK is 32768 bytes.
    This will severely limit memory registrations.
libibverbs: Warning: RLIMIT_MEMLOCK is 32768 bytes.
    This will severely limit memory registrations.
libibverbs: Warning: RLIMIT_MEMLOCK is 32768 bytes.
    This will severely limit memory registrations.
libibverbs: Warning: RLIMIT_MEMLOCK is 32768 bytes.
    This will severely limit memory registrations.
libibverbs: Warning: RLIMIT_MEMLOCK is 32768 bytes.
    This will severely limit memory registrations.
libibverbs: Warning: RLIMIT_MEMLOCK is 32768 bytes.
    This will severely limit memory registrations.
libibverbs: Warning: RLIMIT_MEMLOCK is 32768 bytes.
    This will severely limit memory registrations.
libibverbs: Warning: RLIMIT_MEMLOCK is 32768 bytes.
    This will severely limit memory registrations.
libibverbs: Warning: RLIMIT_MEMLOCK is 32768 bytes.
    This will severely limit memory registrations.
libibverbs: Warning: RLIMIT_MEMLOCK is 32768 bytes.
    This will severely limit memory registrations.
libibverbs: Warning: RLIMIT_MEMLOCK is 32768 bytes.
    This will severely limit memory registrations.
libibverbs: Warning: RLIMIT_MEMLOCK is 32768 bytes.
    This will severely limit memory registrations.
libibverbs: Warning: RLIMIT_MEMLOCK is 32768 bytes.
    This will severely limit memory registrations.
libibverbs: Warning: RLIMIT_MEMLOCK is 32768 bytes.
    This will severely limit memory registrations.
libibverbs: Warning: RLIMIT_MEMLOCK is 32768 bytes.
    This will severely limit memory registrations.
libibverbs: Warning: RLIMIT_MEMLOCK is 32768 bytes.
    This will severely limit memory registrations.
libibverbs: Warning: RLIMIT_MEMLOCK is 32768 bytes.
    This will severely limit memory registrations.
libibverbs: Warning: RLIMIT_MEMLOCK is 32768 bytes.
    This will severely limit memory registrations.
libibverbs: Warning: RLIMIT_MEMLOCK is 32768 bytes.
    This will severely limit memory registrations.
libibverbs: Warning: RLIMIT_MEMLOCK is 32768 bytes.
    This will severely limit memory registrations.
libibverbs: Warning: RLIMIT_MEMLOCK is 32768 bytes.
    This will severely limit memory registrations.
libibverbs: Warning: RLIMIT_MEMLOCK is 32768 bytes.
    This will severely limit memory registrations.
libibverbs: Warning: RLIMIT_MEMLOCK is 32768 bytes.
    This will severely limit memory registrations.
libibverbs: Warning: RLIMIT_MEMLOCK is 32768 bytes.
    This will severely limit memory registrations.
libibverbs: Warning: RLIMIT_MEMLOCK is 32768 bytes.
    This will severely limit memory registrations.
Process 26 on node5
Process 8 on node3
Process 28 on node5
Process 1 on node2
Process 29 on node5
Process 4 on node2
Process 22 on node4
Process 2 on node2
Process 15 on node3
Process 25 on node5
Process 31 on node5
Process 38 on node6
Process 14 on node3
Process 30 on node5
Process 32 on node6
Process 39 on node6
Process 37 on node6
Process 33 on node6
Process 36 on node6
Process 35 on node6
Process 16 on node4
Process 18 on node4
Process 10 on node3
Process 21 on node4
Process 19 on node4
Process 20 on node4
Process 11 on node3
Process 17 on node4
Process 9 on node3
Process 0 on node2
Process 7 on node2
Process 6 on node2
Process 5 on node2
Process 23 on node4
Process 24 on node5
Process 3 on node2
Process 27 on node5
Process 34 on node6
Process 12 on node3
Process 13 on node3
pi is approximately 3.1416009869231245, Error is 0.0000083333333314
wall clock time = 3.002147
[node1:02112] 39 more processes have sent help message help-mpi-btl-openib.txt / mem-reg-fail
[node1:02112] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
[node1:02112] 36 more processes have sent help message help-mpi-btl-openib.txt / init-fail-no-mem
[node1:02112] 39 more processes have sent help message help-mpi-btl-openib.txt / error in device init

也计算出来了,但是多了许多warniing 和error的提示。

在各个节点修改了/etc/security/limits.conf 和/etc/init.d/sshd, 还是不行。

到底问题在哪里?
回复此楼

» 猜你喜欢

» 本主题相关价值贴推荐,对您同样有帮助:

已阅   回复此楼   关注TA 给TA发消息 送TA红花 TA的回帖

qh203

铜虫 (小有名气)

这个问题我自己已经解决了。 普通用户的memlock不够。root用户下,在每个节点的/etc/security/limits.conf文件里增加两行
某个普通用户名 soft memlock unlimited
某个普通用户名 hard memlock unlimited

然后要重启每个服务器节点。<----这一点很重要,否则切换到普通用户下,会出现
memlock cannot modify limit: Operation not permitte.
2楼2013-10-13 21:22:41
已阅   回复此楼   关注TA 给TA发消息 送TA红花 TA的回帖
相关版块跳转 我要订阅楼主 qh203 的主题更新
信息提示
请填处理意见