24小时热门版块排行榜    

CyRhmU.jpeg
查看: 575  |  回复: 2
当前主题已经存档。

wuchenwf

荣誉版主 (职业作家)

[交流] 安装mpich2成功后 运行mpdtrace 出现的问题 谢谢各位了

各位前辈,大家好。小弟我的机器是4核intel的机器  采用ifort 安装mpich2出现文件夹 以及所需要的 文件如  mpd 等   运行mpd 成功 但是运行mpdtrace后出现错误,以下是我的操作内容和错误内容(其中我的机器名为wuchenwf-desktop mpich2安装路径为/opt/mpich2)
---------------------------------------------------------------------
root@wuchenwf-desktop:~# mpd &
[1] 8542
root@wuchenwf-desktop:~# mpdtrace
mpdtrace (send_dict_msg 426):send_dict_msg: sock= errmsg=32, 'Broken pipe'):
mpdtb:
   /opt/mpich2/bin/mpdlib.py,  426,  send_dict_msg
   /opt/mpich2/bin/mpdtrace,  51,  mpdtrace
   /opt/mpich2/bin/mpdtrace,  83,  

mpdtrace: unexpected msg from mpd=:{'error_msg': 'invalid secretword to root mpd'}:
--------------------------------------------------------------------------
我看好象是两个错误,而且运行mpdallexit也出现相近问题 ,请问这个错误该如何解决,麻烦各位了 十分感谢
回复此楼
已阅   回复此楼   关注TA 给TA发消息 送TA红花 TA的回帖

alwens

铁杆木虫 (正式写手)

老木虫

收藏下面这篇文章对MPICH2的用户很有用。

附:mpich2运行mpd错误debug
1. Install mpich2, and thus mpd.
2. Make sure the mpich2 bin directory is in your path. Below, we will
refer to it as MPDDIR.
3. Kill old mpd processes. If you are coming to this guide from
elsewhere,
e.g. a Quick Start guide for mpich2, because you encountered mpd
problems, you should make sure that all mpd processes are terminated
on the hosts where you have been testing. mpdallexit may assist in
this, but probably not if you were having problems. You may need to
use the Unix kill command to terminate the processes.
4. Run a first mpd (alone on a first node). As mentioned above, mpd
uses client-server communications to perform its work. So, before
running
an mpd, let's run a simpler program (mpdcheck) to verify that
these communications are likely to be successful. Even on hosts where
communications are well supported, sometimes there are problems
associated
with hostname resolution, etc. So, it is worth the effort to
proceed a bit slowly. Below, we assume that you have installed mpd
and have it in your path.
Select a test node, let's call it n1. Login to n1.
First, we will run mpdcheck as a server and a client. To run it as a
server, get into a window with a command-line and run this:
n1 $ mpdcheck -s
It will print something like this:
server listening at INADDR_ANY on: n1 1234
Now, run the client side (in another window if convenient) and see
if it can find the server and communicate. Be sure to use the same
hostname and portnumber printed by the server (above: n1 1234):
n1 $ mpdcheck -c n1 1234
If all goes well, the server will print something like:
server has conn on

from ('192.168.1.1', 1234)
server successfully recvd msg from client:
hello_from_client_to_server
A TROUBLESHOOTING MPDS 29
and the client will print:
client successfully recvd ack from server:
ack_from_server_to_client
If the experiment failed, you have some network or machine
configuration
problem which will also be a problem later when you try to use
mpd. Even if the experiment succeeded, but the hostname printed by
the server was localhost, then you will probably have problems later
if you try to use mpd on n1 in conjunction with other hosts. In either
case, skip to Section A.2 "Debugging host/network configuration
problems."
If the experiment succeeded, then you should be ready to try mpd on
this one host. To start an mpd, you will use the mpd command. To
run parallel programs, you will use the mpiexec program. All mpd
commands accept the -h or -help arguments, e.g.:
n1 $ mpd --help
n1 $ mpiexec --help
Try a few tests:
n1 $ mpd &
n1 $ mpiexec -n 1 /bin/hostname
n1 $ mpiexec -l -n 4 /bin/hostname
n1 $ mpiexec -n 2 PATH_TO_MPICH2_EXAMPLES/cpi
where PATH TO MPICH2 EXAMPLES is the path to the mpich2-1.0.3/examples
directory.
To terminate the mpd:
n1 $ mpdallexit
5. Run a second mpd (alone on a second node). To verify that things
are fine on a second host (say n2 ), login to n2 and perform the same
set of tests that you did on n1. Make sure that you use mpdallexit to
terminate the mpd so you will be ready for further tests.
A TROUBLESHOOTING MPDS 30
6. Run a ring of two mpds on two hosts. Before running a ring of mpds
on n1 and n2, we will again use mpdcheck, but this time between the
two machines. We do this because the two nodes may have trouble
locating each other or communicating between them and it is easier
to check this out with the smaller program.
First, we will make sure that a server on n1 can service a client from
n2. On n1:
n1 $ mpdcheck -s
which will print a hostname (hopefully n1) and a portnumber (say
3333 here). On n2:
n2 $ mpdcheck -c n1 3333
If this experiment fails, skip to Section A.2 "Debugging host/network
configuration problems".
Second, we will make sure that a server on n2 can service a client
from
n1. On n2:
n2 $ mpdcheck -s
which will print a hostname (hopefully n2) and a portnumber (say
7777 here). On n2:
n2 $ mpdcheck -c n2 7777
If this experiment fails, skip to Section A.2 "Debugging host/network
configuration problems".
If all went well, we are ready to try a pair of mpds on n1 and n2.
First, make sure that all mpds have terminated on both n1 and n2.
Use mpdallexit or simply kill them with:
kill -9 PID_OF_MPD
where you have obtained the PID OF MPD by some means such as the
ps command.
On n1:
A TROUBLESHOOTING MPDS 31
n1 $ mpd &
n1 $ mpdtrace -l
This will print a list of machines in the ring, in this case just n1.
The
output will be something like:
n1_6789 (192.168.1.1)
The 6789 is the port that the mpd is listeneing on for connections
from other mpds wishing to enter the ring. We will use that port in a
moment to get an mpd from n2 into the ring. The value in parentheses
should be the IP address of n1.
On n2:
n2 $ mpd -h n1 -p 6789 &
where 6789 is the listening port on n1 (from mpdtrace above). Now
try:
n2 $ mpdtrace -l
You should see both mpds in the ring.
To run some programs in parallel:
n1 $ mpiexec -n 2 /bin/hostname
n1 $ mpiexec -n 4 /bin/hostname
n1 $ mpiexec -l -n 4 /bin/hostname
n1 $ mpiexec -l -n 4 PATH_TO_MPICH2_EXAMPLES/cpi
where PATH TO MPICH2 EXAMPLES is the path to the mpich2-1.0.5/examples
directory.
To bring down the ring of mpds:
n1 $ mpdallexit
7. Boot a ring of two mpds via mpdboot. Please be aware that mpdboot
uses ssh by default to start remote mpds. It will expect that you can
run ssh from n1 to n2 (and from n2 to n1) without entering a password.
First, make sure that you terminate the mpd processes from any prior
tests.
On n1, create a file named mpd.hosts containing the name of n2:
A TROUBLESHOOTING MPDS 32
n2
Then, on n1 run:
n1 $ mpdboot -n 2
n1 $ mpdtrace -l
n1 $ mpiexec -l -n 2 /bin/hostname
The mpdboot command should read the mpd.hosts file created above
and run an mpd on each of the two machines. The mpdtrace and
mpiexec show the ring up and functional. Options that may be useful
are:
· --help use this one for extra details on all options
· -v (verbose)
· --chkup tries to verify that the hosts are up before starting mpds
· --chkuponly only performs the verify step, then ends
To bring the ring down:
n1 $ mpdallexit
If mpdboot works on the two machines n1 and n2, it will probably work
on your others as well. But, there could be configuration problems
using a new machine on which you have not yet tested mpd. An
easy way to check, is to gradually add them to mpd.hosts and try an
mpdboot with a -n arg that uses them all each time. Use mpdallexit
after each test.

[ Last edited by alwens on 2008-1-18 at 14:32 ]
万里夕阳锦背高 翻身犹恨东洋小 太公怎钓?
2楼2008-01-18 14:31:21
已阅   回复此楼   关注TA 给TA发消息 送TA红花 TA的回帖
相关版块跳转 我要订阅楼主 wuchenwf 的主题更新
普通表情 高级回复(可上传附件)
信息提示
请填处理意见