24小时热门版块排行榜    

查看: 2781  |  回复: 21
当前只显示满足指定条件的回帖,点击这里查看本话题的所有回帖

wnryc

新虫 (初入文坛)

[交流] 【求助】Performing VASP: mpich2 someties breaks down 已有5人参与

前面我发个同样的贴,可是问题没有得到解决。比较急,希望大家能够帮助下,谢谢!

我在RHEL 5.4; mpich2-1.2.1p1;pgi-9.0.1;双核Xeon E5504 (intel CPU)环境下跑并行的vasp。有些作业,能够正常并行计算(使用命令:mpiexec -n 8 vasp.pgi >out& or mpiexec -n 8 vasp.pgi   out& or mpiexec -n 8 vasp.pgi  ,有些作业却不能运行(我安装mpich2-1.2.p1), pgi.9.0.1没有问题),这时程序读了INCAR,POTCAR,POSCAR和KPOINTS文件后,屏幕提示如下的错误:
----------------------------------------------------------------------
running on    8 nodes
distr:  one band on    1 nodes,    8 groups
vasp.4.6.21  23Feb03 complex
POSCAR found :  3 types and   30 ions
LDA part: xc-table for Ceperly-Alder, Vosko type interpolation para-ferro
POSCAR, INCAR and KPOINTS ok, starting setup
WARNING: wrap around errors must be expected
mpiexec_qltang1 (handle_stdin_input 1089): stdin problem; if pgm is run in background, redirect from /dev/null
mpiexec_qltang1 (handle_stdin_input 1090):     e.g.: mpiexec -n 4 a.out < /dev/null &
mpiexec_qltang1 (handle_stdin_input 1089): stdin problem; if pgm is run in background, redirect from /dev/null
mpiexec_qltang1 (handle_stdin_input 1090):     e.g.: mpiexec -n 4 a.out < /dev/null &
mpiexec_qltang1 (handle_stdin_input 1089): stdin problem; if pgm is run in background, redirect from /dev/null
mpiexec_qltang1 (handle_stdin_input 1090):     e.g.: mpiexec -n 4 a.out < /dev/null &
mpiexec_qltang1 (handle_stdin_input 1089): stdin problem; if pgm is run in background, redirect from /dev/null
mpiexec_qltang1 (handle_stdin_input 1090):     e.g.: mpiexec -n 4 a.out < /dev/null &
mpiexec_qltang1 (handle_stdin_input 1089): stdin problem; if pgm is run in background, redirect from /dev/null
mpiexec_qltang1 (handle_stdin_input 1090):     e.g.: mpiexec -n 4 a.out < /dev/null &
mpiexec_qltang1 (handle_stdin_input 1089): stdin problem; if pgm is run in background, redirect from /dev/null
mpiexec_qltang1 (handle_stdin_input 1090):     e.g.: mpiexec -n 4 a.out < /dev/null &
mpiexec_qltang1 (handle_stdin_input 1089): stdin problem; if pgm is run in background, redirect from /dev/null
mpiexec_qltang1 (handle_stdin_input 1090):     e.g.: mpiexec -n 4 a.out < /dev/null &
FFT: planning ...            2
reading WAVECAR
mpiexec_qltang1 (handle_stdin_input 1089): stdin problem; if pgm is run in background, redirect from /dev/null
mpiexec_qltang1 (handle_stdin_input 1090):     e.g.: mpiexec -n 4 a.out < /dev/null &
mpiexec_qltang1 (handle_stdin_input 1089): stdin problem; if pgm is run in background, redirect from /dev/null
mpiexec_qltang1 (handle_stdin_input 1090):     e.g.: mpiexec -n 4 a.out < /dev/null &
mpiexec_qltang1 (handle_stdin_input 1089): stdin problem; if pgm is run in background, redirect from /dev/null
mpiexec_qltang1 (handle_stdin_input 1090):     e.g.: mpiexec -n 4 a.out < /dev/null &
mpiexec_qltang1 (handle_stdin_input 1089): stdin problem; if pgm is run in background, redirect from /dev/null
mpiexec_qltang1 (handle_stdin_input 1090):     e.g.: mpiexec -n 4 a.out < /dev/null &
mpiexec_qltang1 (handle_stdin_input 1089): stdin problem; if pgm is run in background, redirect from /dev/null
mpiexec_qltang1 (handle_stdin_input 1090):     e.g.: mpiexec -n 4 a.out < /dev/null &
mpiexec_qltang1 (handle_stdin_input 1089): stdin problem; if pgm is run in background, redirect from /dev/null
mpiexec_qltang1 (handle_stdin_input 1090):     e.g.: mpiexec -n 4 a.out < /dev/null &
mpiexec_qltang1 (handle_stdin_input 1089): stdin problem; if pgm is run in background, redirect from /dev/null
mpiexec_qltang1 (handle_stdin_input 1090):     e.g.: mpiexec -n 4 a.out < /dev/null &
mpiexec_qltang1 (handle_stdin_input 1089): stdin problem; if pgm is run in background, redirect from /dev/null
mpiexec_qltang1 (handle_stdin_input 1090):     e.g.: mpiexec -n 4 a.out < /dev/null &
mpiexec_qltang1 (handle_stdin_input 1089): stdin problem; if pgm is run in background, redirect from /dev/null
mpiexec_qltang1 (handle_stdin_input 1090):     e.g.: mpiexec -n 4 a.out < /dev/null &
mpiexec_qltang1 (handle_stdin_input 1089): stdin problem; if pgm is run in background, redirect from /dev/null
mpiexec_qltang1 (handle_stdin_input 1090):     e.g.: mpiexec -n 4 a.out < /dev/null &
mpiexec_qltang1 (handle_stdin_input 1089): stdin problem; if pgm is run in background, redirect from /dev/null
mpiexec_qltang1 (handle_stdin_input 1090):     e.g.: mpiexec -n 4 a.out < /dev/null &
mpiexec_qltang1 (handle_stdin_input 1089): stdin problem; if pgm is run in background, redirect from /dev/null
mpiexec_qltang1 (handle_stdin_input 1090):     e.g.: mpiexec -n 4 a.out < /dev/null &
mpiexec_qltang1 (handle_stdin_input 1089): stdin problem; if pgm is run in background, redirect from /dev/null
mpiexec_qltang1 (handle_stdin_input 1090):     e.g.: mpiexec -n 4 a.out < /dev/null &
mpiexec_qltang1 (handle_stdin_input 1089): stdin problem; if pgm is run in background, redirect from /dev/null
mpiexec_qltang1 (handle_stdin_input 1090):     e.g.: mpiexec -n 4 a.out < /dev/null &
mpiexec_qltang1 (handle_stdin_input 1089): stdin problem; if pgm is run in background, redirect from /dev/null
mpiexec_qltang1 (handle_stdin_input 1090):     e.g.: mpiexec -n 4 a.out < /dev/null &
mpiexec_qltang1 (handle_stdin_input 1089): stdin problem; if pgm is run in background, redirect from /dev/null
mpiexec_qltang1 (handle_stdin_input 1090):     e.g.: mpiexec -n 4 a.out < /dev/null &
mpiexec_qltang1 (handle_stdin_input 1089): stdin problem; if pgm is run in background, redirect from /dev/null
mpiexec_qltang1 (handle_stdin_input 1090):     e.g.: mpiexec -n 4 a.out < /dev/null &
mpiexec_qltang1 (handle_stdin_input 1089): stdin problem; if pgm is run in background, redirect from /dev/null
mpiexec_qltang1 (handle_stdin_input 1090):     e.g.: mpiexec -n 4 a.out < /dev/null &
mpiexec_qltang1 (handle_stdin_input 1089): stdin problem; if pgm is run in background, redirect from /dev/null
mpiexec_qltang1 (handle_stdin_input 1090):     e.g.: mpiexec -n 4 a.out < /dev/null &
mpiexec_qltang1 (handle_stdin_input 1089): stdin problem; if pgm is run in background, redirect from /dev/null
mpiexec_qltang1 (handle_stdin_input 1090):     e.g.: mpiexec -n 4 a.out < /dev/null &
mpiexec_qltang1 (handle_stdin_input 1089): stdin problem; if pgm is run in background, redirect from /dev/null
mpiexec_qltang1 (handle_stdin_input 1090):     e.g.: mpiexec -n 4 a.out < /dev/null &
WARNING: random wavefunctions but no delay for mixing, default for NELMDL
entering main loop
       N       E                     dE             d eps       ncg     rms          rms(c)
rank 6 in job 36  qltang1_54199   caused collective abort of all ranks
  exit status of rank 6: killed by signal 9
rank 3 in job 36  qltang1_54199   caused collective abort of all ranks
  exit status of rank 3: killed by signal 9

希望能得到帮助,谢谢!
回复此楼
Dr.Qian-LinTang
已阅   回复此楼   关注TA 给TA发消息 送TA红花 TA的回帖

wnryc

新虫 (初入文坛)

我感觉是stack size limit的问题
我又试了下gump_813276建议的在vasp源程序增加一个limit.c文件,按照修改方法,我在main.F中的这个位置:
      CHARACTER (LEN=5)   IDENTIFY
!-----parameters for sphpro.f
      INTEGER :: LDIMP,LMDIMP,LTRUNC=3

!=======================================================================
! All COMMON blocks
!=======================================================================
      INTEGER IXMIN,IXMAX,IYMIN,IYMAX,IZMIN,IZMAX
      COMMON /WAVCUT/ IXMIN,IXMAX,IYMIN,IYMAX,IZMIN,IZMAX

      INTEGER  ISYMOP,NROT,IGRPOP,NROTK,INVMAP,NPCELL
      REAL(q)  GTRANS,AP
      REAL(q)  RHOTOT(4)
      INTEGER(8) IL,I1,I2_0,I3,I4
#ifdef gammareal
      CHARACTER (LEN=*),PARAMETER :: VASP='vasp.4.6.21  13Feb03 gamma-only'
#else
      CHARACTER (LEN=*),PARAMETER :: VASP='vasp.4.6.21  23Feb03 complex '
#endif

      COMMON /SYMM/   ISYMOP(3,3,48),NROT,IGRPOP(3,3,48),NROTK, &
     &                GTRANS(3,48),INVMAP(48),AP(3,3),NPCELL

!=======================================================================
!  initialise / set constants and parameters ...
!=======================================================================
      IO%LOPEN =.TRUE.  ! open all files with file names
      IO%IU0   = 6
      IO%IU6   = 8
#ifdef Fujitsu
      IO%IU5   = 7
#else
      IO%IU5   = 5
#endif
!R.S
      tiu6 = IO%IU6
      tiu0 = IO%IU0

      IO%ICMPLX=ICMPLX
      IO%MRECL =MRECL
      PRED%ICMPLX=ICMPLX

      CALL stacksize() (注:这是新增加的行)

      CALL TIMING(0,UTIME,STIME,ETIME,MINPGF,MAJPGF, &
     &            RSIZM,AVSIZ,ISWPS,IOOPS,IVCSW,IERR)
      IF (IERR/=0) ETIME=0._q
! switch off kill
然后编译串行vasp,得到二进制vasp文件。可执行测试算例时,同样VASP只读入输入文件就报这样的错误:
[qltang@qltang1 pgi]$ !vasp
vasp
Before: cur=-1,hard=-1
After: res=0,cur=-1,hard=-1
vasp.4.6.21  23Feb03 complex
POSCAR found :  3 types and   48 ions
LDA part: xc-table for Ceperly-Alder, Vosko type interpolation para-ferro
POSCAR, INCAR and KPOINTS ok, starting setup
WARNING: wrap around errors must be expected
FFT: planning ...           16
reading WAVECAR
entering main loop
       N       E                     dE             d eps       ncg     rms          rms(c)
*** glibc detected *** vasp: free(): invalid next size (fast): 0x0000000011c92760 ***
======= Backtrace: =========
/lib64/libc.so.6[0x3fd74722ef]
/lib64/libc.so.6(cfree+0x4b)[0x3fd747273b]
vasp[0x4f6176]
======= Memory map: ========
00400000-006bc000 r-xp 00000000 08:03 1770606                            /usr/local/bin/vasp
008bb000-008e9000 rwxp 002bb000 08:03 1770606                            /usr/local/bin/vasp
008e9000-00c32000 rwxp 008e9000 00:00 0
11bc5000-123cc000 rwxp 11bc5000 00:00 0                                  [heap]
3fd6c00000-3fd6c1c000 r-xp 00000000 08:03 3204371                        /lib64/ld-2.5.so
3fd6e1b000-3fd6e1c000 r-xp 0001b000 08:03 3204371                        /lib64/ld-2.5.so
3fd6e1c000-3fd6e1d000 rwxp 0001c000 08:03 3204371                        /lib64/ld-2.5.so
3fd7400000-3fd754d000 r-xp 00000000 08:03 3204372                        /lib64/libc-2.5.so
3fd754d000-3fd774d000 ---p 0014d000 08:03 3204372                        /lib64/libc-2.5.so
3fd774d000-3fd7751000 r-xp 0014d000 08:03 3204372                        /lib64/libc-2.5.so
3fd7751000-3fd7752000 rwxp 00151000 08:03 3204372                        /lib64/libc-2.5.so
3fd7752000-3fd7757000 rwxp 3fd7752000 00:00 0
3fd7c00000-3fd7c82000 r-xp 00000000 08:03 3204376                        /lib64/libm-2.5.so
3fd7c82000-3fd7e81000 ---p 00082000 08:03 3204376                        /lib64/libm-2.5.so
3fd7e81000-3fd7e82000 r-xp 00081000 08:03 3204376                        /lib64/libm-2.5.so
3fd7e82000-3fd7e83000 rwxp 00082000 08:03 3204376                        /lib64/libm-2.5.so
3fd8000000-3fd8016000 r-xp 00000000 08:03 3204374                        /lib64/libpthread-2.5.so
3fd8016000-3fd8215000 ---p 00016000 08:03 3204374                        /lib64/libpthread-2.5.so
3fd8215000-3fd8216000 r-xp 00015000 08:03 3204374                        /lib64/libpthread-2.5.so
3fd8216000-3fd8217000 rwxp 00016000 08:03 3204374                        /lib64/libpthread-2.5.so
3fd8217000-3fd821b000 rwxp 3fd8217000 00:00 0
3fd8800000-3fd8807000 r-xp 00000000 08:03 3204377                        /lib64/librt-2.5.so
3fd8807000-3fd8a07000 ---p 00007000 08:03 3204377                        /lib64/librt-2.5.so
3fd8a07000-3fd8a08000 r-xp 00007000 08:03 3204377                        /lib64/librt-2.5.so
3fd8a08000-3fd8a09000 rwxp 00008000 08:03 3204377                        /lib64/librt-2.5.so
3fdd000000-3fdd00d000 r-xp 00000000 08:03 3202051                        /lib64/libgcc_s-4.1.2-20080825.so.1
3fdd00d000-3fdd20d000 ---p 0000d000 08:03 3202051                        /lib64/libgcc_s-4.1.2-20080825.so.1
3fdd20d000-3fdd20e000 rwxp 0000d000 08:03 3202051                        /lib64/libgcc_s-4.1.2-20080825.so.1
2b2cd10a3000-2b2cd10a9000 rwxp 2b2cd10a3000 00:00 0
2b2cd10ca000-2b2ce4648000 rwxp 2b2cd10ca000 00:00 0
7fff16e0a000-7fff16e1f000 rwxp 7ffffffea000 00:00 0                      [stack]
ffffffffff600000-ffffffffffe00000 ---p 00000000 00:00 0                  [vdso]
Aborted (core dumped)
[qltang@qltang1 pgi]$

所以我想继续请教这个问题:我的问题出在哪里呢?和我装的rhel 5.4有关吗?
谢谢!
Dr.Qian-LinTang
16楼2010-06-25 11:14:18
已阅   回复此楼   关注TA 给TA发消息 送TA红花 TA的回帖
查看全部 22 个回答

gump_813276

铜虫 (小有名气)

★ ★
小木虫(金币+0.5):给个红包,谢谢回帖交流
zzy870720z(金币+1):谢谢指点 2010-06-17 12:26:05
我上次回的你试了吗?
上面说的是 stdin problem
所以我觉得 你的第一个命令是对的mpiexec -n 8 vasp.pgi >out&
你试试 还有问题的话再说
2楼2010-06-17 12:19:32
已阅   回复此楼   关注TA 给TA发消息 送TA红花 TA的回帖

wnryc

新虫 (初入文坛)

不是mpiexec命令的问题,上面都拭了,出现同样的error message
Dr.Qian-LinTang
3楼2010-06-17 15:23:13
已阅   回复此楼   关注TA 给TA发消息 送TA红花 TA的回帖

valenhou001

至尊木虫 (职业作家)

★ ★
小木虫(金币+0.5):给个红包,谢谢回帖交流
zzy870720z(金币+1):谢谢指导 2010-06-17 18:33:36
#/bin/sh
mpdtrace -l
# Check the connectivity.
mpdringtest 100

mpiexec -n  2    vasp的路径    >  out 2>& 1

mpdallexit


试试上面的。
4楼2010-06-17 15:33:53
已阅   回复此楼   关注TA 给TA发消息 送TA红花 TA的回帖
普通表情 高级回复 (可上传附件)
信息提示
请填处理意见