| 查看: 1745 | 回复: 5 | ||
04nylxb木虫 (正式写手)
|
[求助]
请问MS-linux-cluster安装之后无法并行的问题?
|
|
我是让MS用rsh进行的,cluster上已经成功配置好rsh的免认证登陆了,我在运行的时候,总是提示,然后很快就failed bash: /opt/hpmpi/bin/mpid: No such file or directory bash: /opt/hpmpi/bin/mpid: No such file or directory 求指点安装完之后,是否还需要对hpmpi作一些配置? (rsh配置好的,是免认证登录的,rsh nodexx 不需要任何密码就切换了) 求高人指点……不胜感激。 我将gw-info.sbd和gwparams.cfg的cpucorestotal都改成总数了,64。并修改了mpi运行参数,支持ib。 选择4个进程(单节点)运行时,出现这样的错误:求指点,貌似是mpi出问题了 Current trace stack: model_write_occ_eigenvalues model_write_all model_write geom_BFGS geometry_optimise castep Trapped SIGINT or SIGTERM. Exiting... Trapped SIGINT or SIGTERM. Exiting... Trapped SIGINT or SIGTERM. Exiting... MPI_CPU_AFFINITY set to RANK, setting affinity of rank 2 pid 23177 on host master to cpu 2 MPI_CPU_AFFINITY set to RANK, setting affinity of rank 3 pid 23178 on host master to cpu 3 MPI_CPU_AFFINITY set to RANK, setting affinity of rank 1 pid 23176 on host master to cpu 1 MPI_CPU_AFFINITY set to RANK, setting affinity of rank 0 pid 23175 on host master to cpu 0 MPI Application rank 0 exited before MPI_Finalize() with status 1 ———————————————————————————————————————— (分割线下是旧问题,呵呵) att,ms并行机安装,安装过程提示 should hpmpi use ssh? [Y/n] 我十六个节点,配置的是rsh的免认证登陆,ssh没有配置,如果我上面选择no 的话,是否后面进行并行计算的时候就是以rsh的方式进行了呢? 求高人指点,呵呵。 另,linux下如何卸载ms?因为我之前用msi账户安装的时候选择了ssh,结果忘了并行机上配置的是rsh,如果配置ssh的话,比较麻烦。 我将主节点整个home目录都做成了nfs共享到各个计算节点了,这样在主节点master上 生成一个key的时候,就被共享到其它节点了 ($ ssh-keygen -t rsa,默认生成~./ssh id_rsa id_rsa.pub)。在其它节点进行同样操作的时候,当生成密钥的时候,也是放在home下面的,这时候因为主节点共享了home,就会提示已经有key存在了,需要覆盖,晕。然后我就不知道该如何解决了。 请指点。 [ Last edited by 04nylxb on 2011-6-22 at 08:31 ] |
» 猜你喜欢
A期刊撤稿
已经有3人回复
职称评审没过,求安慰
已经有34人回复
垃圾破二本职称评审标准
已经有17人回复
回收溶剂求助
已经有6人回复
投稿Elsevier的Neoplasia杂志,到最后选publishing options时页面空白,不能完成投稿
已经有22人回复
申请26博士
已经有5人回复
EST投稿状态问题
已经有7人回复
毕业后当辅导员了,天天各种学生超烦
已经有4人回复
求助文献
已经有3人回复
投稿返修后收到这样的回复,还有希望吗
已经有8人回复
» 本主题相关价值贴推荐,对您同样有帮助:
MS在LINUX上的安装
已经有5人回复
关于3个并列第一作者的问题
已经有31人回复
关于并行的问题
已经有13人回复
关于 MS 5.5并行的问题
已经有4人回复
Linux系统管理技术手册(第二版)【转载】
已经有35人回复
MS5.5在Linux上安装的经验分享
已经有4人回复
GW计算的错误提示【问题已解决】
已经有4人回复
MPI的安装(管理员和普通用户均可)
已经有34人回复
请问LINUX下怎么杀死死掉的任务
已经有23人回复
Linux Gaussian09 无法安装。。求高人指点
已经有10人回复
【已解决】ms任务在linux下排队问题 create_appfile.sh
已经有4人回复
【求助】谁能帮我解决在linux下安装ms出现的问题啊
已经有8人回复
【求助】linux 系统MS 如何运行
已经有11人回复
【求助】ms安装不上怎么弄?
已经有15人回复
【求助】Linux系统top命令下面几行意思 win7安装ms
已经有8人回复
【讨论】PC-cluster的管理
已经有13人回复
【求助】安装问题,安装后程序无法执行
已经有12人回复
【求助】siesta3.0并行成功安装后不能并行的问题
已经有15人回复
【求助】turbomole并行的问题
已经有10人回复
【求助】MS在LINUX下的安装问题
已经有22人回复
【求助】linux下安装MS软件不成功
已经有9人回复
【求助】castep并行计算问题
已经有15人回复

lbambool
木虫 (著名写手)
- 1ST强帖: 7
- 应助: 35 (小学生)
- 金币: 3755.1
- 散金: 2648
- 红花: 27
- 帖子: 2403
- 在线: 393.4小时
- 虫号: 96218
- 注册: 2005-11-06
- 性别: GG
- 专业: 无机非金属材料

2楼2011-06-22 20:04:20
04nylxb
木虫 (正式写手)
- 应助: 33 (小学生)
- 金币: 2321.9
- 散金: 46
- 红花: 4
- 帖子: 824
- 在线: 262.6小时
- 虫号: 817223
- 注册: 2009-07-28
- 性别: GG
- 专业: 工程热物理相关交叉领域
|
嗯,非常感谢啊。系统管理员为了管理方便,就将整个home都作了共享。嗯,各个节点都安装了hpmpi。 现在上面的问题解决了,hpmpi通了, 我用dmol3试了下,发觉计算完全正常,汗。CASTEP就出现问题,新问题如下,求指点,呵呵 Job started on host master at Wed Jun 22 21:55:51 2011 MPI_CPU_AFFINITY set to RANK, setting affinity of rank 1 pid 1156 on host master to cpu 1 MPI_CPU_AFFINITY set to RANK, setting affinity of rank 2 pid 1157 on host master to cpu 2 MPI_CPU_AFFINITY set to RANK, setting affinity of rank 0 pid 1155 on host master to cpu 0 MPI_CPU_AFFINITY set to RANK, setting affinity of rank 3 pid 1158 on host master to cpu 3 MPI_CPU_AFFINITY set to RANK, setting affinity of rank 13 pid 7947 on host node3 to cpu 1 MPI_CPU_AFFINITY set to RANK, setting affinity of rank 14 pid 7948 on host node3 to cpu 0 MPI_CPU_AFFINITY set to RANK, setting affinity of rank 25 pid 15375 on host node6 to cpu 1 MPI_CPU_AFFINITY set to RANK, setting affinity of rank 21 pid 6099 on host node5 to cpu 1 MPI_CPU_AFFINITY set to RANK, setting affinity of rank 23 pid 6101 on host node5 to cpu 1 MPI_CPU_AFFINITY set to RANK, setting affinity of rank 26 pid 15376 on host node6 to cpu 0 MPI_CPU_AFFINITY set to RANK, setting affinity of rank 4 pid 24707 on host node1 to cpu 0 MPI_CPU_AFFINITY set to RANK, setting affinity of rank 15 pid 7949 on host node3 to cpu 1 MPI_CPU_AFFINITY set to RANK, setting affinity of rank 12 pid 7946 on host node3 to cpu 0 MPI_CPU_AFFINITY set to RANK, setting affinity of rank 27 pid 15377 on host node6 to cpu 1 MPI_CPU_AFFINITY set to RANK, setting affinity of rank 24 pid 15374 on host node6 to cpu 0 MPI_CPU_AFFINITY set to RANK, setting affinity of rank 20 pid 6098 on host node5 to cpu 0 MPI_CPU_AFFINITY set to RANK, setting affinity of rank 22 pid 6100 on host node5 to cpu 0 MPI_CPU_AFFINITY set to RANK, setting affinity of rank 6 pid 24709 on host node1 to cpu 0 MPI_CPU_AFFINITY set to RANK, setting affinity of rank 7 pid 24710 on host node1 to cpu 1 MPI_CPU_AFFINITY set to RANK, setting affinity of rank 5 pid 24708 on host node1 to cpu 1 MPI_CPU_AFFINITY set to RANK, setting affinity of rank 8 pid 8980 on host node2 to cpu 0 MPI_CPU_AFFINITY set to RANK, setting affinity of rank 10 pid 8982 on host node2 to cpu 0 MPI_CPU_AFFINITY set to RANK, setting affinity of rank 30 pid 14483 on host node7 to cpu 0 MPI_CPU_AFFINITY set to RANK, setting affinity of rank 31 pid 14484 on host node7 to cpu 1 MPI_CPU_AFFINITY set to RANK, setting affinity of rank 9 pid 8981 on host node2 to cpu 1 MPI_CPU_AFFINITY set to RANK, setting affinity of rank 11 pid 8983 on host node2 to cpu 1 MPI_CPU_AFFINITY set to RANK, setting affinity of rank 29 pid 14482 on host node7 to cpu 1 MPI_CPU_AFFINITY set to RANK, setting affinity of rank 28 pid 14481 on host node7 to cpu 0 MPI_CPU_AFFINITY set to RANK, setting affinity of rank 17 pid 17856 on host node4 to cpu 1 MPI_CPU_AFFINITY set to RANK, setting affinity of rank 18 pid 17857 on host node4 to cpu 0 MPI_CPU_AFFINITY set to RANK, setting affinity of rank 19 pid 17858 on host node4 to cpu 1 MPI_CPU_AFFINITY set to RANK, setting affinity of rank 16 pid 17855 on host node4 to cpu 0 warning:regcache incompatible with malloc warning:regcache incompatible with malloc warning:regcache incompatible with malloc warning:regcache incompatible with malloc warning:regcache incompatible with malloc warning:regcache incompatible with malloc warning:regcache incompatible with malloc warning:regcache incompatible with malloc warning:regcache incompatible with malloc arning:regcache incompatible with malloc warning:regcache incompatible with malloc warning:regcache incompatible with malloc warning:regcache incompatible with malloc warning:regcache incompatible with malloc warning:regcache incompatible with malloc warning:regcache incompatible with malloc warning:regcache incompatible with malloc warning:regcache incompatible with malloc warning:regcache incompatible with malloc warning:regcache incompatible with malloc warning:regcache incompatible with malloc warning:regcache incompatible with malloc arning:regcache incompatible with malloc warning:regcache incompatible with malloc warning:regcache incompatible with malloc warning:regcache incompatible with malloc warning:regcache incompatible with malloc MX:node6:mx__connect_common(00:60:dd:48:d9:57):error 36(errno=3) estination NIC not found in network tableMPI Application rank 26 exited before MPI_Finalize() with status 1 MX:node3:mx__connect_common(00:60:dd:48:d9:28):error 36(errno=3) estination NIC not found in network tableMX:node3:mx__connect_common(00:60:dd:48:d9:28):error 36(errno=3) estination NIC not found in network tableMX:node2:Remote endpoint is closed, peer=00:60:dd:48:d8:f0 (node3:0) MMX:node7:Remote endpoint is closed, peer=00:60:dd:48:d8:f0 (node3:0) MX:node4:Remote endpoint is closed, peer=00:60:dd:48:d8:f0 (node3:0) MX:node5:Remote endpoint is closed, peer=00:60:dd:48:d8:f0 (node3:0) MMX:node2:Remote endpoint is closed, peer=00:60:dd:48:d8:f0 (node3:0) MX:node5:Remote endpoint is closed, peer=00:60:dd:48:d8:f0 (node3:0) MX:node1:Remote endpoint is closed, peer=00:60:dd:48:d8:f0 (node3:0) X:node1:Remote endpoint is closed, peer=00:60:dd:48:d8:f0 (node3:0) MX:node4:Remote endpoint is closed, peer=00:60:dd:48:d8:f0 (node3:0) MX:node7:Remote endpoint is closed, peer=00:60:dd:48:d8:f0 (node3:0) MPI Application rank 15 exited before MPI_Finalize() with status 1 MX:node5:Remote endpoint is closed, peer=00:60:dd:48:d8:f0 (node3:0) MX:node2:Remote endpoint is closed, peer=00:60:dd:48:d8:f0 (node3:0) MX:node7:Remote endpoint is closed, peer=00:60:dd:48:d8:f0 (node3:0) MPI Application rank 29 exited before MPI_Finalize() with status 1 MPI Application rank 21 exited before MPI_Finalize() with status 1 forrtl: error (78): process killed (SIGTERM) Image PC Routine Line Source libpthread.so.0 0096D21A Unknown Unknown Unknown libmyriexpress.so B6F7535D Unknown Unknown Unknown libmpi.so.1 B7A3401F Unknown Unknown Unknown libmpi.so.1 B7A10622 Unknown Unknown Unknown libmpi.so.1 B7A0FFCB Unknown Unknown Unknown libmpi.so.1 B7A60BDF Unknown Unknown Unknown libmpi.so.1 B7A6AF17 Unknown Unknown Unknown castepexe_mpi.exe 080A68E9 Unknown Unknown Unknown castepexe_mpi.exe 08F5D992 Unknown Unknown Unknown …………………… |

3楼2011-06-22 22:33:35
04nylxb
木虫 (正式写手)
- 应助: 33 (小学生)
- 金币: 2321.9
- 散金: 46
- 红花: 4
- 帖子: 824
- 在线: 262.6小时
- 虫号: 817223
- 注册: 2009-07-28
- 性别: GG
- 专业: 工程热物理相关交叉领域

4楼2011-06-22 22:43:17
lbambool
木虫 (著名写手)
- 1ST强帖: 7
- 应助: 35 (小学生)
- 金币: 3755.1
- 散金: 2648
- 红花: 27
- 帖子: 2403
- 在线: 393.4小时
- 虫号: 96218
- 注册: 2005-11-06
- 性别: GG
- 专业: 无机非金属材料
【答案】应助回帖
★ ★
04nylxb(金币+3): 收到,多谢啊。呵呵。我一个一个节点验证,发现有几个节点不行 2011-06-23 13:00:32
zzy870720z(金币+2): 谢谢指教 2011-06-23 14:17:53
04nylxb(金币+3): 收到,多谢啊。呵呵。我一个一个节点验证,发现有几个节点不行 2011-06-23 13:00:32
zzy870720z(金币+2): 谢谢指教 2011-06-23 14:17:53
|
你用InfiniBand?这技术我没有接触过,如果dmol3可以使用,但castep不可用的话,先检查一下是否是lic的问题,如果不是的话硬件配置问题可能性较大了,下面是网上查到的一些信息,希望对你有用。你可以让你的管理员看一下这段文字,看能否处理一下。 What is warning:regcache incompatible with malloc ? Myrinet MX uses registration cache (see the "Acronyms in high performance interconnect world" table above) to achieve higher performance. When registration cache feature is enabled, Myrinet MX will manage all memory allocations by itself, i.e. it has its own implemetation of malloc, free, realloc, mremap, munmap, sbrk, etc (see mx__regcache.c in libmyriexpress package) The warning message in question pops up when mx__regcache_works returns 0. For Linux, this means when calling a pair of malloc/free, the variable mx__hook_triggered is not triggerred. Registration cache checks can be disabled by setting the environmental variable MX_RCACHE to 2. Registration cache can sometimes cause weird errors. It can be disabled by setting the environmental variable MX_RCACHE to 0. |

5楼2011-06-23 09:04:09
04nylxb
木虫 (正式写手)
- 应助: 33 (小学生)
- 金币: 2321.9
- 散金: 46
- 红花: 4
- 帖子: 824
- 在线: 262.6小时
- 虫号: 817223
- 注册: 2009-07-28
- 性别: GG
- 专业: 工程热物理相关交叉领域

6楼2011-06-23 17:15:33













回复此楼
estination NIC not found in network table