24小时热门版块排行榜    

查看: 2738  |  回复: 21
当前只显示满足指定条件的回帖,点击这里查看本话题的所有回帖

wnryc

新虫 (初入文坛)

[交流] 【求助】Performing VASP: mpich2 someties breaks down已有5人参与

前面我发个同样的贴,可是问题没有得到解决。比较急,希望大家能够帮助下,谢谢!

我在RHEL 5.4; mpich2-1.2.1p1;pgi-9.0.1;双核Xeon E5504 (intel CPU)环境下跑并行的vasp。有些作业,能够正常并行计算(使用命令:mpiexec -n 8 vasp.pgi >out& or mpiexec -n 8 vasp.pgi   out& or mpiexec -n 8 vasp.pgi  ,有些作业却不能运行(我安装mpich2-1.2.p1), pgi.9.0.1没有问题),这时程序读了INCAR,POTCAR,POSCAR和KPOINTS文件后,屏幕提示如下的错误:
----------------------------------------------------------------------
running on    8 nodes
distr:  one band on    1 nodes,    8 groups
vasp.4.6.21  23Feb03 complex
POSCAR found :  3 types and   30 ions
LDA part: xc-table for Ceperly-Alder, Vosko type interpolation para-ferro
POSCAR, INCAR and KPOINTS ok, starting setup
WARNING: wrap around errors must be expected
mpiexec_qltang1 (handle_stdin_input 1089): stdin problem; if pgm is run in background, redirect from /dev/null
mpiexec_qltang1 (handle_stdin_input 1090):     e.g.: mpiexec -n 4 a.out < /dev/null &
mpiexec_qltang1 (handle_stdin_input 1089): stdin problem; if pgm is run in background, redirect from /dev/null
mpiexec_qltang1 (handle_stdin_input 1090):     e.g.: mpiexec -n 4 a.out < /dev/null &
mpiexec_qltang1 (handle_stdin_input 1089): stdin problem; if pgm is run in background, redirect from /dev/null
mpiexec_qltang1 (handle_stdin_input 1090):     e.g.: mpiexec -n 4 a.out < /dev/null &
mpiexec_qltang1 (handle_stdin_input 1089): stdin problem; if pgm is run in background, redirect from /dev/null
mpiexec_qltang1 (handle_stdin_input 1090):     e.g.: mpiexec -n 4 a.out < /dev/null &
mpiexec_qltang1 (handle_stdin_input 1089): stdin problem; if pgm is run in background, redirect from /dev/null
mpiexec_qltang1 (handle_stdin_input 1090):     e.g.: mpiexec -n 4 a.out < /dev/null &
mpiexec_qltang1 (handle_stdin_input 1089): stdin problem; if pgm is run in background, redirect from /dev/null
mpiexec_qltang1 (handle_stdin_input 1090):     e.g.: mpiexec -n 4 a.out < /dev/null &
mpiexec_qltang1 (handle_stdin_input 1089): stdin problem; if pgm is run in background, redirect from /dev/null
mpiexec_qltang1 (handle_stdin_input 1090):     e.g.: mpiexec -n 4 a.out < /dev/null &
FFT: planning ...            2
reading WAVECAR
mpiexec_qltang1 (handle_stdin_input 1089): stdin problem; if pgm is run in background, redirect from /dev/null
mpiexec_qltang1 (handle_stdin_input 1090):     e.g.: mpiexec -n 4 a.out < /dev/null &
mpiexec_qltang1 (handle_stdin_input 1089): stdin problem; if pgm is run in background, redirect from /dev/null
mpiexec_qltang1 (handle_stdin_input 1090):     e.g.: mpiexec -n 4 a.out < /dev/null &
mpiexec_qltang1 (handle_stdin_input 1089): stdin problem; if pgm is run in background, redirect from /dev/null
mpiexec_qltang1 (handle_stdin_input 1090):     e.g.: mpiexec -n 4 a.out < /dev/null &
mpiexec_qltang1 (handle_stdin_input 1089): stdin problem; if pgm is run in background, redirect from /dev/null
mpiexec_qltang1 (handle_stdin_input 1090):     e.g.: mpiexec -n 4 a.out < /dev/null &
mpiexec_qltang1 (handle_stdin_input 1089): stdin problem; if pgm is run in background, redirect from /dev/null
mpiexec_qltang1 (handle_stdin_input 1090):     e.g.: mpiexec -n 4 a.out < /dev/null &
mpiexec_qltang1 (handle_stdin_input 1089): stdin problem; if pgm is run in background, redirect from /dev/null
mpiexec_qltang1 (handle_stdin_input 1090):     e.g.: mpiexec -n 4 a.out < /dev/null &
mpiexec_qltang1 (handle_stdin_input 1089): stdin problem; if pgm is run in background, redirect from /dev/null
mpiexec_qltang1 (handle_stdin_input 1090):     e.g.: mpiexec -n 4 a.out < /dev/null &
mpiexec_qltang1 (handle_stdin_input 1089): stdin problem; if pgm is run in background, redirect from /dev/null
mpiexec_qltang1 (handle_stdin_input 1090):     e.g.: mpiexec -n 4 a.out < /dev/null &
mpiexec_qltang1 (handle_stdin_input 1089): stdin problem; if pgm is run in background, redirect from /dev/null
mpiexec_qltang1 (handle_stdin_input 1090):     e.g.: mpiexec -n 4 a.out < /dev/null &
mpiexec_qltang1 (handle_stdin_input 1089): stdin problem; if pgm is run in background, redirect from /dev/null
mpiexec_qltang1 (handle_stdin_input 1090):     e.g.: mpiexec -n 4 a.out < /dev/null &
mpiexec_qltang1 (handle_stdin_input 1089): stdin problem; if pgm is run in background, redirect from /dev/null
mpiexec_qltang1 (handle_stdin_input 1090):     e.g.: mpiexec -n 4 a.out < /dev/null &
mpiexec_qltang1 (handle_stdin_input 1089): stdin problem; if pgm is run in background, redirect from /dev/null
mpiexec_qltang1 (handle_stdin_input 1090):     e.g.: mpiexec -n 4 a.out < /dev/null &
mpiexec_qltang1 (handle_stdin_input 1089): stdin problem; if pgm is run in background, redirect from /dev/null
mpiexec_qltang1 (handle_stdin_input 1090):     e.g.: mpiexec -n 4 a.out < /dev/null &
mpiexec_qltang1 (handle_stdin_input 1089): stdin problem; if pgm is run in background, redirect from /dev/null
mpiexec_qltang1 (handle_stdin_input 1090):     e.g.: mpiexec -n 4 a.out < /dev/null &
mpiexec_qltang1 (handle_stdin_input 1089): stdin problem; if pgm is run in background, redirect from /dev/null
mpiexec_qltang1 (handle_stdin_input 1090):     e.g.: mpiexec -n 4 a.out < /dev/null &
mpiexec_qltang1 (handle_stdin_input 1089): stdin problem; if pgm is run in background, redirect from /dev/null
mpiexec_qltang1 (handle_stdin_input 1090):     e.g.: mpiexec -n 4 a.out < /dev/null &
mpiexec_qltang1 (handle_stdin_input 1089): stdin problem; if pgm is run in background, redirect from /dev/null
mpiexec_qltang1 (handle_stdin_input 1090):     e.g.: mpiexec -n 4 a.out < /dev/null &
mpiexec_qltang1 (handle_stdin_input 1089): stdin problem; if pgm is run in background, redirect from /dev/null
mpiexec_qltang1 (handle_stdin_input 1090):     e.g.: mpiexec -n 4 a.out < /dev/null &
mpiexec_qltang1 (handle_stdin_input 1089): stdin problem; if pgm is run in background, redirect from /dev/null
mpiexec_qltang1 (handle_stdin_input 1090):     e.g.: mpiexec -n 4 a.out < /dev/null &
mpiexec_qltang1 (handle_stdin_input 1089): stdin problem; if pgm is run in background, redirect from /dev/null
mpiexec_qltang1 (handle_stdin_input 1090):     e.g.: mpiexec -n 4 a.out < /dev/null &
mpiexec_qltang1 (handle_stdin_input 1089): stdin problem; if pgm is run in background, redirect from /dev/null
mpiexec_qltang1 (handle_stdin_input 1090):     e.g.: mpiexec -n 4 a.out < /dev/null &
WARNING: random wavefunctions but no delay for mixing, default for NELMDL
entering main loop
       N       E                     dE             d eps       ncg     rms          rms(c)
rank 6 in job 36  qltang1_54199   caused collective abort of all ranks
  exit status of rank 6: killed by signal 9
rank 3 in job 36  qltang1_54199   caused collective abort of all ranks
  exit status of rank 3: killed by signal 9

希望能得到帮助,谢谢!
回复此楼

» 收录本帖的淘帖专辑推荐

VASP

» 猜你喜欢

» 本主题相关价值贴推荐,对您同样有帮助:

Dr.Qian-LinTang
已阅   回复此楼   关注TA 给TA发消息 送TA红花 TA的回帖

wnryc

新虫 (初入文坛)

不是mpiexec命令的问题,上面都拭了,出现同样的error message
Dr.Qian-LinTang
3楼2010-06-17 15:23:13
已阅   回复此楼   关注TA 给TA发消息 送TA红花 TA的回帖
查看全部 22 个回答

gump_813276

铜虫 (小有名气)

★ ★
小木虫(金币+0.5):给个红包,谢谢回帖交流
zzy870720z(金币+1):谢谢指点 2010-06-17 12:26:05
我上次回的你试了吗?
上面说的是 stdin problem
所以我觉得 你的第一个命令是对的mpiexec -n 8 vasp.pgi >out&
你试试 还有问题的话再说
2楼2010-06-17 12:19:32
已阅   回复此楼   关注TA 给TA发消息 送TA红花 TA的回帖

valenhou001

至尊木虫 (职业作家)

★ ★
小木虫(金币+0.5):给个红包,谢谢回帖交流
zzy870720z(金币+1):谢谢指导 2010-06-17 18:33:36
#/bin/sh
mpdtrace -l
# Check the connectivity.
mpdringtest 100

mpiexec -n  2    vasp的路径    >  out 2>& 1

mpdallexit


试试上面的。
4楼2010-06-17 15:33:53
已阅   回复此楼   关注TA 给TA发消息 送TA红花 TA的回帖

wnryc

新虫 (初入文坛)

按照建议,我这样操作,过程如下:
[qltang@qltang1 pgi]$ /bin/sh
sh-3.2$ mpdtrace -l
qltang1_54199 (127.0.0.1)
sh-3.2$ mpdringtest 100
time for 100 loops = 0.0500471591949 seconds
sh-3.2$ mpiexec -n  2 /usr/local/bin/ >out 2>&1
sh-3.2$  mpiexec -n  2 /usr/local/bin/vasp.pgi >out 2>&1
sh-3.2$
结果是同样的,VASP只读了输入文件就退出来了,即:
running on    2 nodes
distr:  one band on    1 nodes,    2 groups
vasp.4.6.21  23Feb03 complex
POSCAR found :  3 types and   48 ions
LDA part: xc-table for Ceperly-Alder, Vosko type interpolation para-ferro
POSCAR, INCAR and KPOINTS ok, starting setup
WARNING: wrap around errors must be expected
FFT: planning ...           10
reading WAVECAR
entering main loop
       N       E                     dE             d eps       ncg     rms          rms(c)
rank 1 in job 63  qltang1_54199   caused collective abort of all ranks
  exit status of rank 1: killed by signal 11
  
我又问了下mpich2 support, 他说可能不是mpich2原因,是core du的原因,即他说:“"ulimit -c unlimited" is the usual means to enable core dumps.  If that's not working for you, then either your VASP program isn't dumping core or core dumps must be enabled some other way.  You'll have to google for the appropriate way to enable them on your platform.”

这样,我又编译了个串行VASP,先执行命令:
ulimit -c unlimited
ulimit -s unlimited
然后跑串行的vasp.pgi.serial,发现VASP只读了输入文件就退出来了,而且给出这样的提示:
[qltang@qltang1 pgi2]$ vasp.pgi.serial
vasp.4.6.21  23Feb03 complex
POSCAR found :  3 types and   48 ions
LDA part: xc-table for Ceperly-Alder, Vosko type interpolation para-ferro
POSCAR, INCAR and KPOINTS ok, starting setup
WARNING: wrap around errors must be expected
FFT: planning ...           16
reading WAVECAR
entering main loop
       N       E                     dE             d eps       ncg     rms          rms(c)
*** glibc detected *** vasp.pgi.serial: free(): invalid next size (fast): 0x0000000005310760 ***
======= Backtrace: =========
/lib64/libc.so.6[0x3fd74722ef]
/lib64/libc.so.6(cfree+0x4b)[0x3fd747273b]
vasp.pgi.serial[0x4f6176]
======= Memory map: ========
00400000-006bc000 r-xp 00000000 08:03 1770734                            /usr/local/bin/vasp.pgi.serial
008bb000-008e8000 rwxp 002bb000 08:03 1770734                            /usr/local/bin/vasp.pgi.serial
008e8000-00c32000 rwxp 008e8000 00:00 0
05243000-05a4a000 rwxp 05243000 00:00 0                                  [heap]
3fd6c00000-3fd6c1c000 r-xp 00000000 08:03 3204371                        /lib64/ld-2.5.so
3fd6e1b000-3fd6e1c000 r-xp 0001b000 08:03 3204371                        /lib64/ld-2.5.so
3fd6e1c000-3fd6e1d000 rwxp 0001c000 08:03 3204371                        /lib64/ld-2.5.so
3fd7400000-3fd754d000 r-xp 00000000 08:03 3204372                        /lib64/libc-2.5.so
3fd754d000-3fd774d000 ---p 0014d000 08:03 3204372                        /lib64/libc-2.5.so
3fd774d000-3fd7751000 r-xp 0014d000 08:03 3204372                        /lib64/libc-2.5.so
3fd7751000-3fd7752000 rwxp 00151000 08:03 3204372                        /lib64/libc-2.5.so
3fd7752000-3fd7757000 rwxp 3fd7752000 00:00 0
3fd7c00000-3fd7c82000 r-xp 00000000 08:03 3204376                        /lib64/libm-2.5.so
3fd7c82000-3fd7e81000 ---p 00082000 08:03 3204376                        /lib64/libm-2.5.so
3fd7e81000-3fd7e82000 r-xp 00081000 08:03 3204376                        /lib64/libm-2.5.so
3fd7e82000-3fd7e83000 rwxp 00082000 08:03 3204376                        /lib64/libm-2.5.so
3fd8000000-3fd8016000 r-xp 00000000 08:03 3204374                        /lib64/libpthread-2.5.so
3fd8016000-3fd8215000 ---p 00016000 08:03 3204374                        /lib64/libpthread-2.5.so
3fd8215000-3fd8216000 r-xp 00015000 08:03 3204374                        /lib64/libpthread-2.5.so
3fd8216000-3fd8217000 rwxp 00016000 08:03 3204374                        /lib64/libpthread-2.5.so
3fd8217000-3fd821b000 rwxp 3fd8217000 00:00 0
3fd8800000-3fd8807000 r-xp 00000000 08:03 3204377                        /lib64/librt-2.5.so
3fd8807000-3fd8a07000 ---p 00007000 08:03 3204377                        /lib64/librt-2.5.so
3fd8a07000-3fd8a08000 r-xp 00007000 08:03 3204377                        /lib64/librt-2.5.so
3fd8a08000-3fd8a09000 rwxp 00008000 08:03 3204377                        /lib64/librt-2.5.so
3fdd000000-3fdd00d000 r-xp 00000000 08:03 3202051                        /lib64/libgcc_s-4.1.2-20080825.so.1
3fdd00d000-3fdd20d000 ---p 0000d000 08:03 3202051                        /lib64/libgcc_s-4.1.2-20080825.so.1
3fdd20d000-3fdd20e000 rwxp 0000d000 08:03 3202051                        /lib64/libgcc_s-4.1.2-20080825.so.1
2abf81586000-2abf8158c000 rwxp 2abf81586000 00:00 0
2abf815ad000-2abf94b2b000 rwxp 2abf815ad000 00:00 0
7fff8fa7d000-7fff8fa92000 rwxp 7ffffffea000 00:00 0                      [stack]
ffffffffff600000-ffffffffffe00000 ---p 00000000 00:00 0                  [vdso]
Aborted (core dumped)
[qltang@qltang1 pgi2]$

不知道原因出在哪里,问题是有些作业可以跑串行的,能得到结果,有些串行作业vasp只读输入文件就停止了(出现上面的报错信息)。希望这问题能早点解决,谢谢!
Dr.Qian-LinTang
5楼2010-06-17 17:16:38
已阅   回复此楼   关注TA 给TA发消息 送TA红花 TA的回帖
普通表情 高级回复(可上传附件)
信息提示
请填处理意见