24小时热门版块排行榜    

查看: 1270  |  回复: 1

Rankery

银虫 (小有名气)

[求助] 本人使用并行程序模拟燃烧问题 已有1人参与

前段时间运行原有程序突然今天出现如下错误
[node27:10104] plm:tm: failed to spawn daemon, error code = 15012
--------------------------------------------------------------------------
A daemon (pid unknown) died unexpectedly on signal 1  while attempting to
launch so we are aborting.

There may be more information reported by the environment (see above).

This may be because the daemon was unable to find all the needed shared
libraries on the remote node. You may set your LD_LIBRARY_PATH to have the
location of the shared libraries on the remote nodes and this will
automatically be forwarded to the remote nodes.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun noticed that the job aborted, but has no info as to the process
that caused that situation.

因为node27是并行计算的首个节点所以出现这样的错误,更换其他节点也会出现同样的错误
程序本身是断点计算不会出现什么错误,只是前段时间断电关闭服务器原因导致
请问各位这是怎么回事?该如何解决?
回复此楼
生命不息,奋斗不止
已阅   回复此楼   关注TA 给TA发消息 送TA红花 TA的回帖

charleslian

木虫 (小有名气)

【答案】应助回帖

感谢参与,应助指数 +1
断电导致的某个守护进程出错。后台其他节点上的该守护进程找不到应该加载的动态库。估计是pbs任务管理系统的守护进程,可以试试全部重启,如果文件损坏只好重装。

[ 发自小木虫客户端 ]
2楼2015-07-21 21:24:23
已阅   回复此楼   关注TA 给TA发消息 送TA红花 TA的回帖
相关版块跳转 我要订阅楼主 Rankery 的主题更新
信息提示
请填处理意见