24小时热门版块排行榜    

查看: 1293  |  回复: 7

gleerat

木虫 (正式写手)

[交流] 【求助】并行vasp出了问题 已有5人参与

在root帐户下:
CODE:
[root@node1 zhuqx_hp]# which vasp.openmpi
/home/bin/vasp.openmpi
[root@node1 zhuqx_hp]# which mpirun
/home/software/mpich-1.2.7-intel9/bin/mpirun
[root@node1 zhuqx_hp]#

在普通帐户下:
CODE:
[zhuqx_hp@node1 lt]$ which vasp.openmpi
/usr/bin/which: no vasp.openmpi in (/usr/local/bin:/bin:/usr/bin:/usr/X11R6/bin:/home/software/wien2k_shmem:/home/software/wien2k_shmem/SRC_structeditor/bin:.:/home/software/wien2k_shmem:.:/home/users/zhuqx_hp/bin)
[zhuqx_hp@node1 lt]$ which mpirun
/usr/bin/mpirun [zhuqx_hp@node1 lt]$

这样我不知道该用哪一个地址,干脆都试了试,结果按照root帐户中mpirun的位置:
CODE:
[zhuqx_hp@node1 lt]$ /home/software/mpich-1.2.7-intel9/bin/mpirun -np 4 /home/bin/vasp.openmpi
  running on    1 nodes
  distr:  one band on    1 nodes,    1 groups
  vasp.4.6.28 25Jul05 complex
   POSCAR found :  3 types and   20 ions
  LDA part: xc-table for Ceperly-Alder, standard interpolation
  POSCAR, INCAR and KPOINTS ok, starting setup
  WARNING: wrap around errors must be expected  FFT: planning ...           5
  reading WAVECAR
  LAPACK: Routine ZPOTRF failed!          67

按照普通帐户中mpirun的位置则成了:
CODE:
[zhuqx_hp@node1 lt]$ /usr/bin/mpirun -np 4 /home/bin/vasp.openmpi
  running on    1 nodes
  running on    1 nodes
  running on    1 nodes
  distr:  one band on    1 nodes,    1 groups
  distr:  one band on    1 nodes,    1 groups
  vasp.4.6.28 25Jul05 complex
   vasp.4.6.28 25Jul05 complex
   distr:  one band on    1 nodes,    1 groups
  vasp.4.6.28 25Jul05 complex
   POSCAR found :  3 types and   20 ions
   POSCAR found :  3 types and   20 ions
   POSCAR found :  3 types and   20 ions
  running on    1 nodes
  distr:  one band on    1 nodes,    1 groups
  vasp.4.6.28 25Jul05 complex
   POSCAR found :  3 types and   20 ions
  LDA part: xc-table for Ceperly-Alder, standard interpolation
  LDA part: xc-table for Ceperly-Alder, standard interpolation
  LDA part: xc-table for Ceperly-Alder, standard interpolation
  LDA part: xc-table for Ceperly-Alder, standard interpolation
  POSCAR, INCAR and KPOINTS ok, starting setup
  POSCAR, INCAR and KPOINTS ok, starting setup
  POSCAR, INCAR and KPOINTS ok, starting setup
  POSCAR, INCAR and KPOINTS ok, starting setup
  WARNING: wrap around errors must be expected
  WARNING: wrap around errors must be expected
  WARNING: wrap around errors must be expected
  WARNING: wrap around errors must be expected
  FFT: planning ...           5
  reading WAVECAR
  FFT: planning ...           5
  FFT: planning ...           5
  FFT: planning ...           5
  reading WAVECAR
  reading WAVECAR
  reading WAVECAR
  LAPACK: Routine ZPOTRF failed!          67
-----------------------------------------------------------------------------
It seems that [at least] one of the processes that was started with mpirun did not invoke MPI_INIT before quitting (it is possible that more than one process did not invoke MPI_INIT -- mpirun was only notified of the first one, which was on node n0).  mpirun can *only* be used with MPI programs (i.e., programs that invoke MPI_INIT and MPI_FINALIZE).  You can use the "lamexec" program to run non-MPI programs over the lambooted nodes.
-----------------------------------------------------------------------------
forrtl: error (78): process killed (SIGTERM)
forrtl: error (78): process killed (SIGTERM)
forrtl: error (78): process killed (SIGTERM)
[zhuqx_hp@node1 lt]$

另外,原来的脚本中给出mpirun的位置是/home/software/openmpi-1.2.2-intel9/bin/mpirun,我用这个地址的结果是:
CODE:
[zhuqx_hp@node1 lt]$ /home/software/openmpi-1.2.2-intel9/bin/mpirun -np 4 /home/bin/vasp.openmpi
  running on    4 nodes
  distr:  one band on    1 nodes,    4 groups
  vasp.4.6.28 25Jul05 complex
   POSCAR found :  3 types and   20 ions
  LDA part: xc-table for Ceperly-Alder, standard interpolation
  POSCAR, INCAR and KPOINTS ok, starting setup
  WARNING: wrap around errors must be expected
  FFT: planning ...           2
  reading WAVECAR
  LAPACK: Routine ZPOTRF failed!          67
  LAPACK: Routine ZPOTRF failed!          67
  LAPACK: Routine ZPOTRF failed!          67
  LAPACK: Routine ZPOTRF failed!          67

谁帮忙解释一下,提供一个解决方案。谢谢。

[ Last edited by gleerat on 2010-5-25 at 22:26 ]
回复此楼
明察、慎思、笃行
已阅   回复此楼   关注TA 给TA发消息 送TA红花 TA的回帖

zzy870720z

荣誉版主 (文坛精英)

优秀版主优秀版主优秀版主优秀版主

★ ★
小木虫(金币+0.5):给个红包,谢谢回帖交流
xiaohunhun(金币+1):谢谢 2010-05-25 23:39:33
用mpdcheck测试一下mpi吧
可能是mpi的问题
博学、审问、慎思、明辨、笃学
2楼2010-05-25 22:51:41
已阅   回复此楼   关注TA 给TA发消息 送TA红花 TA的回帖

gleerat

木虫 (正式写手)

引用回帖:
Originally posted by zzy870720z at 2010-05-25 22:51:41:
用mpdcheck测试一下mpi吧
可能是mpi的问题

集群上竟然没有mpd!
明察、慎思、笃行
3楼2010-05-25 23:24:40
已阅   回复此楼   关注TA 给TA发消息 送TA红花 TA的回帖

bluewhale

铁杆木虫 (正式写手)

★ ★ ★
小木虫(金币+0.5):给个红包,谢谢回帖交流
aylayl08(金币+2):谢谢提示 2010-05-26 14:08:57
请仔细看提示,VASP是用openmpi编译的,而你用系统带的lam/mpi (/usr/bin下)环境运行!!!???
另外,你又提到了mpich version 1,你将三个东西绞在一块!!!

注意编译和运行环境要统一,最好都 给绝对路径。



[ Last edited by bluewhale on 2010-5-26 at 00:07 ]
4楼2010-05-25 23:58:45
已阅   回复此楼   关注TA 给TA发消息 送TA红花 TA的回帖

163.com

金虫 (著名写手)

实习版主


小木虫(金币+0.5):给个红包,谢谢回帖交流
引用回帖:
Originally posted by bluewhale at 2010-05-25 23:58:45:
请仔细看提示,VASP是用openmpi编译的,而你用系统带的lam/mpi (/usr/bin下)环境运行!!!???
另外,你又提到了mpich version 1,你将三个东西绞在一块!!!

注意编译和运行环境要统一,最好都 给绝对路 ...

按提示多试几次,注意编译环境
阿什顿联
5楼2010-05-26 07:02:09
已阅   回复此楼   关注TA 给TA发消息 送TA红花 TA的回帖

gleerat

木虫 (正式写手)

引用回帖:
Originally posted by 163.com at 2010-05-26 07:02:09:
按提示多试几次,注意编译环境

我知道用/usr/bin/mpirun是不行的,因为提示mpi并行问题;
用/home/software/mpich-1.2.7-intel9/bin/mpirun好像不对,因为没有并行;
而从使用/home/software/openmpi-1.2.2-intel9/bin/mpirun给出的信息(running on    4 nodes
distr:  one band on    1 nodes,    4 groups)上看,应该就是这个了。
可是那个“LAPACK: Routine ZPOTRF failed!          67”怎么解决呢?!
明察、慎思、笃行
6楼2010-05-26 10:38:52
已阅   回复此楼   关注TA 给TA发消息 送TA红花 TA的回帖

bluewhale

铁杆木虫 (正式写手)

★ ★
小木虫(金币+0.5):给个红包,谢谢回帖交流
zzy870720z(金币+1):感谢交流 2010-05-27 08:06:13
最好下一份最新的openmpi,再用intel的Fortran compiler + MKL重新编一下。我们一直在用,从来没有人叫过。

[ Last edited by bluewhale on 2010-5-26 at 22:36 ]
7楼2010-05-26 22:35:05
已阅   回复此楼   关注TA 给TA发消息 送TA红花 TA的回帖

cahndengbin

新虫 (小有名气)


小木虫: 金币+0.5, 给个红包,谢谢回帖
[zhuqx_hp@node1 lt]$ /usr/bin/mpirun -np 4 /home/bin/vasp.openmpi
  running on    1 nodes
  running on    1 nodes
  running on    1 nodes
  distr:  one band on    1 nodes,    1 groups
  distr:  one band on    1 nodes,    1 groups
  vasp.4.6.28 25Jul05 complex
   vasp.4.6.28 25Jul05 complex
   distr:  one band on    1 nodes,    1 groups
  vasp.4.6.28 25Jul05 complex
   POSCAR found :  3 types and   20 ions
   POSCAR found :  3 types and   20 ions
   POSCAR found :  3 types and   20 ions
  running on    1 nodes
  distr:  one band on    1 nodes,    1 groups
  vasp.4.6.28 25Jul05 complex
   POSCAR found :  3 types and   20 ions

怀疑错误在mpirun的选择,你的/usr/bin/mpirun估计是mpich下的mpirun,而不是openmpi下的mpirun
vasp如果是openmpi编译的,执行vasp并行命令时 应当是 /opt/openmpi/bin/mpirun -np 4 vasp.openmpi ,即要用openmpi bin 目录下的mpirun,不是mpich中的mpirun。
8楼2014-10-09 16:40:11
已阅   回复此楼   关注TA 给TA发消息 送TA红花 TA的回帖
相关版块跳转 我要订阅楼主 gleerat 的主题更新
普通表情 高级回复 (可上传附件)
信息提示
请填处理意见