| 查看: 2765 | 回复: 21 | ||||
[交流]
【求助】Performing VASP: mpich2 someties breaks down 已有5人参与
|
|
前面我发个同样的贴,可是问题没有得到解决。比较急,希望大家能够帮助下,谢谢! 我在RHEL 5.4; mpich2-1.2.1p1;pgi-9.0.1;双核Xeon E5504 (intel CPU)环境下跑并行的vasp。有些作业,能够正常并行计算(使用命令:mpiexec -n 8 vasp.pgi >out& or mpiexec -n 8 vasp.pgi out& or mpiexec -n 8 vasp.pgi ,有些作业却不能运行(我安装mpich2-1.2.p1), pgi.9.0.1没有问题),这时程序读了INCAR,POTCAR,POSCAR和KPOINTS文件后,屏幕提示如下的错误: ---------------------------------------------------------------------- running on 8 nodes distr: one band on 1 nodes, 8 groups vasp.4.6.21 23Feb03 complex POSCAR found : 3 types and 30 ions LDA part: xc-table for Ceperly-Alder, Vosko type interpolation para-ferro POSCAR, INCAR and KPOINTS ok, starting setup WARNING: wrap around errors must be expected mpiexec_qltang1 (handle_stdin_input 1089): stdin problem; if pgm is run in background, redirect from /dev/null mpiexec_qltang1 (handle_stdin_input 1090): e.g.: mpiexec -n 4 a.out < /dev/null & mpiexec_qltang1 (handle_stdin_input 1089): stdin problem; if pgm is run in background, redirect from /dev/null mpiexec_qltang1 (handle_stdin_input 1090): e.g.: mpiexec -n 4 a.out < /dev/null & mpiexec_qltang1 (handle_stdin_input 1089): stdin problem; if pgm is run in background, redirect from /dev/null mpiexec_qltang1 (handle_stdin_input 1090): e.g.: mpiexec -n 4 a.out < /dev/null & mpiexec_qltang1 (handle_stdin_input 1089): stdin problem; if pgm is run in background, redirect from /dev/null mpiexec_qltang1 (handle_stdin_input 1090): e.g.: mpiexec -n 4 a.out < /dev/null & mpiexec_qltang1 (handle_stdin_input 1089): stdin problem; if pgm is run in background, redirect from /dev/null mpiexec_qltang1 (handle_stdin_input 1090): e.g.: mpiexec -n 4 a.out < /dev/null & mpiexec_qltang1 (handle_stdin_input 1089): stdin problem; if pgm is run in background, redirect from /dev/null mpiexec_qltang1 (handle_stdin_input 1090): e.g.: mpiexec -n 4 a.out < /dev/null & mpiexec_qltang1 (handle_stdin_input 1089): stdin problem; if pgm is run in background, redirect from /dev/null mpiexec_qltang1 (handle_stdin_input 1090): e.g.: mpiexec -n 4 a.out < /dev/null & FFT: planning ... 2 reading WAVECAR mpiexec_qltang1 (handle_stdin_input 1089): stdin problem; if pgm is run in background, redirect from /dev/null mpiexec_qltang1 (handle_stdin_input 1090): e.g.: mpiexec -n 4 a.out < /dev/null & mpiexec_qltang1 (handle_stdin_input 1089): stdin problem; if pgm is run in background, redirect from /dev/null mpiexec_qltang1 (handle_stdin_input 1090): e.g.: mpiexec -n 4 a.out < /dev/null & mpiexec_qltang1 (handle_stdin_input 1089): stdin problem; if pgm is run in background, redirect from /dev/null mpiexec_qltang1 (handle_stdin_input 1090): e.g.: mpiexec -n 4 a.out < /dev/null & mpiexec_qltang1 (handle_stdin_input 1089): stdin problem; if pgm is run in background, redirect from /dev/null mpiexec_qltang1 (handle_stdin_input 1090): e.g.: mpiexec -n 4 a.out < /dev/null & mpiexec_qltang1 (handle_stdin_input 1089): stdin problem; if pgm is run in background, redirect from /dev/null mpiexec_qltang1 (handle_stdin_input 1090): e.g.: mpiexec -n 4 a.out < /dev/null & mpiexec_qltang1 (handle_stdin_input 1089): stdin problem; if pgm is run in background, redirect from /dev/null mpiexec_qltang1 (handle_stdin_input 1090): e.g.: mpiexec -n 4 a.out < /dev/null & mpiexec_qltang1 (handle_stdin_input 1089): stdin problem; if pgm is run in background, redirect from /dev/null mpiexec_qltang1 (handle_stdin_input 1090): e.g.: mpiexec -n 4 a.out < /dev/null & mpiexec_qltang1 (handle_stdin_input 1089): stdin problem; if pgm is run in background, redirect from /dev/null mpiexec_qltang1 (handle_stdin_input 1090): e.g.: mpiexec -n 4 a.out < /dev/null & mpiexec_qltang1 (handle_stdin_input 1089): stdin problem; if pgm is run in background, redirect from /dev/null mpiexec_qltang1 (handle_stdin_input 1090): e.g.: mpiexec -n 4 a.out < /dev/null & mpiexec_qltang1 (handle_stdin_input 1089): stdin problem; if pgm is run in background, redirect from /dev/null mpiexec_qltang1 (handle_stdin_input 1090): e.g.: mpiexec -n 4 a.out < /dev/null & mpiexec_qltang1 (handle_stdin_input 1089): stdin problem; if pgm is run in background, redirect from /dev/null mpiexec_qltang1 (handle_stdin_input 1090): e.g.: mpiexec -n 4 a.out < /dev/null & mpiexec_qltang1 (handle_stdin_input 1089): stdin problem; if pgm is run in background, redirect from /dev/null mpiexec_qltang1 (handle_stdin_input 1090): e.g.: mpiexec -n 4 a.out < /dev/null & mpiexec_qltang1 (handle_stdin_input 1089): stdin problem; if pgm is run in background, redirect from /dev/null mpiexec_qltang1 (handle_stdin_input 1090): e.g.: mpiexec -n 4 a.out < /dev/null & mpiexec_qltang1 (handle_stdin_input 1089): stdin problem; if pgm is run in background, redirect from /dev/null mpiexec_qltang1 (handle_stdin_input 1090): e.g.: mpiexec -n 4 a.out < /dev/null & mpiexec_qltang1 (handle_stdin_input 1089): stdin problem; if pgm is run in background, redirect from /dev/null mpiexec_qltang1 (handle_stdin_input 1090): e.g.: mpiexec -n 4 a.out < /dev/null & mpiexec_qltang1 (handle_stdin_input 1089): stdin problem; if pgm is run in background, redirect from /dev/null mpiexec_qltang1 (handle_stdin_input 1090): e.g.: mpiexec -n 4 a.out < /dev/null & mpiexec_qltang1 (handle_stdin_input 1089): stdin problem; if pgm is run in background, redirect from /dev/null mpiexec_qltang1 (handle_stdin_input 1090): e.g.: mpiexec -n 4 a.out < /dev/null & mpiexec_qltang1 (handle_stdin_input 1089): stdin problem; if pgm is run in background, redirect from /dev/null mpiexec_qltang1 (handle_stdin_input 1090): e.g.: mpiexec -n 4 a.out < /dev/null & mpiexec_qltang1 (handle_stdin_input 1089): stdin problem; if pgm is run in background, redirect from /dev/null mpiexec_qltang1 (handle_stdin_input 1090): e.g.: mpiexec -n 4 a.out < /dev/null & mpiexec_qltang1 (handle_stdin_input 1089): stdin problem; if pgm is run in background, redirect from /dev/null mpiexec_qltang1 (handle_stdin_input 1090): e.g.: mpiexec -n 4 a.out < /dev/null & mpiexec_qltang1 (handle_stdin_input 1089): stdin problem; if pgm is run in background, redirect from /dev/null mpiexec_qltang1 (handle_stdin_input 1090): e.g.: mpiexec -n 4 a.out < /dev/null & WARNING: random wavefunctions but no delay for mixing, default for NELMDL entering main loop N E dE d eps ncg rms rms(c) rank 6 in job 36 qltang1_54199 caused collective abort of all ranks exit status of rank 6: killed by signal 9 rank 3 in job 36 qltang1_54199 caused collective abort of all ranks exit status of rank 3: killed by signal 9 希望能得到帮助,谢谢! |
» 收录本帖的淘帖专辑推荐
VASP |
» 猜你喜欢
自荐读博
已经有9人回复
投稿Elsevier的杂志(返修),总是在选择OA和subscription界面被踢皮球
已经有8人回复
自然科学基金委宣布启动申请书“瘦身提质”行动
已经有4人回复
求个博导看看
已经有18人回复
» 本主题相关价值贴推荐,对您同样有帮助:
一些凝聚态专业的知识和关于vasp的知识
已经有83人回复
以SiO2为载体的浸渍法,在浸渍后干燥前还用不用洗涤啊?
已经有32人回复
VASP 5.2.12 以上 可以直接在INCAR中设置K点
已经有9人回复
vasp计算中如何比较掺杂前后结合能
已经有13人回复
求助:vasp结构优化不能收敛
已经有18人回复
vasp 计算表面DOS与文献对应不上 请求大家帮助.......
已经有20人回复
【求助】vasp 5.2中用HSE06计算不能进入主循环
已经有7人回复
【求助】关于水热法制备TiO2的XRD图解析
已经有29人回复
【求助】vasp并行安装mpich2的错误
已经有10人回复
down-conversion in Er3+/Yb3+ co-doped YF3
已经有4人回复
【求助】新手求助vasp计算氧分子结合能的问题
已经有12人回复
【求助】求vasp.4.6 安装必备软件
已经有13人回复
【求助】新虫求助!哪位前辈指点一下用vasp做空位缺陷的流程
已经有7人回复
【求助】vasp中关于电场单位的问题
已经有12人回复
【求助】VASP如何计算 离子
已经有10人回复

2楼2010-06-17 12:19:32

3楼2010-06-17 15:23:13
valenhou001
至尊木虫 (职业作家)
- 1ST强帖: 13
- 应助: 241 (大学生)
- 金币: 25701.7
- 散金: 602
- 红花: 166
- 帖子: 3782
- 在线: 873.8小时
- 虫号: 1007127
- 注册: 2010-04-27
- 专业: 凝聚态物性 II :电子结构
4楼2010-06-17 15:33:53
|
按照建议,我这样操作,过程如下: [qltang@qltang1 pgi]$ /bin/sh sh-3.2$ mpdtrace -l qltang1_54199 (127.0.0.1) sh-3.2$ mpdringtest 100 time for 100 loops = 0.0500471591949 seconds sh-3.2$ mpiexec -n 2 /usr/local/bin/ >out 2>&1 sh-3.2$ mpiexec -n 2 /usr/local/bin/vasp.pgi >out 2>&1 sh-3.2$ 结果是同样的,VASP只读了输入文件就退出来了,即: running on 2 nodes distr: one band on 1 nodes, 2 groups vasp.4.6.21 23Feb03 complex POSCAR found : 3 types and 48 ions LDA part: xc-table for Ceperly-Alder, Vosko type interpolation para-ferro POSCAR, INCAR and KPOINTS ok, starting setup WARNING: wrap around errors must be expected FFT: planning ... 10 reading WAVECAR entering main loop N E dE d eps ncg rms rms(c) rank 1 in job 63 qltang1_54199 caused collective abort of all ranks exit status of rank 1: killed by signal 11 我又问了下mpich2 support, 他说可能不是mpich2原因,是core du的原因,即他说:“"ulimit -c unlimited" is the usual means to enable core dumps. If that's not working for you, then either your VASP program isn't dumping core or core dumps must be enabled some other way. You'll have to google for the appropriate way to enable them on your platform.” 这样,我又编译了个串行VASP,先执行命令: ulimit -c unlimited ulimit -s unlimited 然后跑串行的vasp.pgi.serial,发现VASP只读了输入文件就退出来了,而且给出这样的提示: [qltang@qltang1 pgi2]$ vasp.pgi.serial vasp.4.6.21 23Feb03 complex POSCAR found : 3 types and 48 ions LDA part: xc-table for Ceperly-Alder, Vosko type interpolation para-ferro POSCAR, INCAR and KPOINTS ok, starting setup WARNING: wrap around errors must be expected FFT: planning ... 16 reading WAVECAR entering main loop N E dE d eps ncg rms rms(c) *** glibc detected *** vasp.pgi.serial: free(): invalid next size (fast): 0x0000000005310760 *** ======= Backtrace: ========= /lib64/libc.so.6[0x3fd74722ef] /lib64/libc.so.6(cfree+0x4b)[0x3fd747273b] vasp.pgi.serial[0x4f6176] ======= Memory map: ======== 00400000-006bc000 r-xp 00000000 08:03 1770734 /usr/local/bin/vasp.pgi.serial 008bb000-008e8000 rwxp 002bb000 08:03 1770734 /usr/local/bin/vasp.pgi.serial 008e8000-00c32000 rwxp 008e8000 00:00 0 05243000-05a4a000 rwxp 05243000 00:00 0 [heap] 3fd6c00000-3fd6c1c000 r-xp 00000000 08:03 3204371 /lib64/ld-2.5.so 3fd6e1b000-3fd6e1c000 r-xp 0001b000 08:03 3204371 /lib64/ld-2.5.so 3fd6e1c000-3fd6e1d000 rwxp 0001c000 08:03 3204371 /lib64/ld-2.5.so 3fd7400000-3fd754d000 r-xp 00000000 08:03 3204372 /lib64/libc-2.5.so 3fd754d000-3fd774d000 ---p 0014d000 08:03 3204372 /lib64/libc-2.5.so 3fd774d000-3fd7751000 r-xp 0014d000 08:03 3204372 /lib64/libc-2.5.so 3fd7751000-3fd7752000 rwxp 00151000 08:03 3204372 /lib64/libc-2.5.so 3fd7752000-3fd7757000 rwxp 3fd7752000 00:00 0 3fd7c00000-3fd7c82000 r-xp 00000000 08:03 3204376 /lib64/libm-2.5.so 3fd7c82000-3fd7e81000 ---p 00082000 08:03 3204376 /lib64/libm-2.5.so 3fd7e81000-3fd7e82000 r-xp 00081000 08:03 3204376 /lib64/libm-2.5.so 3fd7e82000-3fd7e83000 rwxp 00082000 08:03 3204376 /lib64/libm-2.5.so 3fd8000000-3fd8016000 r-xp 00000000 08:03 3204374 /lib64/libpthread-2.5.so 3fd8016000-3fd8215000 ---p 00016000 08:03 3204374 /lib64/libpthread-2.5.so 3fd8215000-3fd8216000 r-xp 00015000 08:03 3204374 /lib64/libpthread-2.5.so 3fd8216000-3fd8217000 rwxp 00016000 08:03 3204374 /lib64/libpthread-2.5.so 3fd8217000-3fd821b000 rwxp 3fd8217000 00:00 0 3fd8800000-3fd8807000 r-xp 00000000 08:03 3204377 /lib64/librt-2.5.so 3fd8807000-3fd8a07000 ---p 00007000 08:03 3204377 /lib64/librt-2.5.so 3fd8a07000-3fd8a08000 r-xp 00007000 08:03 3204377 /lib64/librt-2.5.so 3fd8a08000-3fd8a09000 rwxp 00008000 08:03 3204377 /lib64/librt-2.5.so 3fdd000000-3fdd00d000 r-xp 00000000 08:03 3202051 /lib64/libgcc_s-4.1.2-20080825.so.1 3fdd00d000-3fdd20d000 ---p 0000d000 08:03 3202051 /lib64/libgcc_s-4.1.2-20080825.so.1 3fdd20d000-3fdd20e000 rwxp 0000d000 08:03 3202051 /lib64/libgcc_s-4.1.2-20080825.so.1 2abf81586000-2abf8158c000 rwxp 2abf81586000 00:00 0 2abf815ad000-2abf94b2b000 rwxp 2abf815ad000 00:00 0 7fff8fa7d000-7fff8fa92000 rwxp 7ffffffea000 00:00 0 [stack] ffffffffff600000-ffffffffffe00000 ---p 00000000 00:00 0 [vdso] Aborted (core dumped) [qltang@qltang1 pgi2]$ 不知道原因出在哪里,问题是有些作业可以跑串行的,能得到结果,有些串行作业vasp只读输入文件就停止了(出现上面的报错信息)。希望这问题能早点解决,谢谢! |

5楼2010-06-17 17:16:38
valenhou001
至尊木虫 (职业作家)
- 1ST强帖: 13
- 应助: 241 (大学生)
- 金币: 25701.7
- 散金: 602
- 红花: 166
- 帖子: 3782
- 在线: 873.8小时
- 虫号: 1007127
- 注册: 2010-04-27
- 专业: 凝聚态物性 II :电子结构
6楼2010-06-17 18:55:03
★ ★ ★
小木虫(金币+0.5):给个红包,谢谢回帖交流
hedaors(金币+2):谢谢分享 2010-06-17 22:42:32
小木虫(金币+0.5):给个红包,谢谢回帖交流
hedaors(金币+2):谢谢分享 2010-06-17 22:42:32
|
我感觉是stack size limit的问题 我试过下面这个方法,挺好用的 创建一个file叫limit.c 内容如下: #include #include #include void stacksize_() { int res; struct rlimit rlim; getrlimit(RLIMIT_STACK, &rlim); printf("Before: cur=%d,hard=%d\n",(int)rlim.rlim_cur,(int)rlim.rlim_max); rlim.rlim_cur=RLIM_INFINITY; rlim.rlim_max=RLIM_INFINITY; res=setrlimit(RLIMIT_STACK, &rlim); getrlimit(RLIMIT_STACK, &rlim); printf("After: res=%d,cur=%d,hard=%d\n",res,(int)rlim.rlim_cur,(int)rlim.rlim_max); } 把这个文件和其他的vasp source code 放在一起 编译这个文件:cc -c -Wall -O2 limit.c 在main.F 的开头加入: CALL stacksize() 具体应该加在所有的声明之后 然后在makefile里SOURCE那一大串的文件最后加入limit.o 试试吧~ |
7楼2010-06-17 22:35:59
valenhou001
至尊木虫 (职业作家)
- 1ST强帖: 13
- 应助: 241 (大学生)
- 金币: 25701.7
- 散金: 602
- 红花: 166
- 帖子: 3782
- 在线: 873.8小时
- 虫号: 1007127
- 注册: 2010-04-27
- 专业: 凝聚态物性 II :电子结构
8楼2010-06-18 10:32:35
|
测试的体系不是很大(16个原子),我还是先弄清楚串行的问题。即使串行的,有时vasp对一些体系能跑起来,有些体系就出现上面提到错误报错。请大家帮我看下我的串行makefile文件是否恰当。 机器配置: 1)Xeon E5504 CPU 2.0G (两颗四核, 64 bit),内存2*4G,Cache size=4096 M 2)RHEL 5.4 (64 bit):Linux qltang1 2.6.18-164.el5 #1 SMP Tue Aug 18 15:51:48 EDT 2009 x86_64 x86_64 x86_64 GNU/Linux 3) mpich2: 1.2.1p1 4) pgi: 9.0.1 makefile内容是(部分没有修改的未列出): .SUFFIXES: .inc .f .f90 .F SUFFIX=.f FC=pgf90 FCL=$(FC) CPP_ = ./preprocess <$*.F | /usr/bin/cpp -P -C -traditional >$*$(SUFFIX) CPP = $(CPP_) -DHOST=\"LinuxPgi\" \ -Dkind8 -DNGXhalf -DCACHE_SIZE=4096 -DPGF90 -Davoidalloc \ -DRPROMU_DGEMV FFLAGS = -Mfree -Mx,119,0x200000 OFLAG = -O2 -tp p7-64 OFLAG_HIGH = $(OFLAG) OBJ_HIGH = OBJ_NOOPT = DEBUG = -g -O0 INLINE = $(OFLAG) BLAS= -L/usr/local/pgi-9.0.1/linux86-64/9.0-1/lib -lblas LAPACK= -L/usr/local/pgi-9.0.1/linux86-64/9.0-1/lib -llapack LIB = -L../vasp.4.lib -ldmy \ ../vasp.4.lib/linpack_double.o $(LAPACK) \ $(BLAS) LINK = FFT3D = fft3dfurth.o fft3dlib.o 用这个makefile编译的串行vasp,有些作业能跑,有些作业就出现上面的问题。请大家帮我检查下,我的m马克 [ Last edited by wnryc on 2010-6-18 at 18:59 ] |

9楼2010-06-18 18:55:30
|
测试的体系不是很大(16个原子),我还是先弄清楚串行的问题。即使串行的,有时vasp对一些体系能跑起来,有些体系就出现上面提到错误报错。请大家帮我看下我的串行makefile文件是否恰当。 机器配置: 1)Xeon E5504 CPU 2.0G (两颗四核, 64 bit),内存2*4G,Cache size=4096 M 2)RHEL 5.4 (64 bit):Linux qltang1 2.6.18-164.el5 #1 SMP Tue Aug 18 15:51:48 EDT 2009 x86_64 x86_64 x86_64 GNU/Linux 3) mpich2: 1.2.1p1 4) pgi: 9.0.1 makefile内容是(部分没有修改的未列出): .SUFFIXES: .inc .f .f90 .F SUFFIX=.f FC=pgf90 FCL=$(FC) CPP_ = ./preprocess <$*.F | /usr/bin/cpp -P -C -traditional >$*$(SUFFIX) CPP = $(CPP_) -DHOST=\"LinuxPgi\" \ -Dkind8 -DNGXhalf -DCACHE_SIZE=4096 -DPGF90 -Davoidalloc \ -DRPROMU_DGEMV FFLAGS = -Mfree -Mx,119,0x200000 OFLAG = -O2 -tp p7-64 OFLAG_HIGH = $(OFLAG) OBJ_HIGH = OBJ_NOOPT = DEBUG = -g -O0 INLINE = $(OFLAG) BLAS= -L/usr/local/pgi-9.0.1/linux86-64/9.0-1/lib -lblas LAPACK= -L/usr/local/pgi-9.0.1/linux86-64/9.0-1/lib -llapack LIB = -L../vasp.4.lib -ldmy \ ../vasp.4.lib/linpack_double.o $(LAPACK) \ $(BLAS) LINK = FFT3D = fft3dfurth.o fft3dlib.o 用这个makefile编译的串行vasp,有些作业能跑,有些作业就出现上面的问题。请大家帮我检查下,我的m |

10楼2010-06-18 18:56:28







回复此楼