| ²é¿´: 1214 | »Ø¸´: 4 | ||
| µ±Ç°Ö»ÏÔʾÂú×ãÖ¸¶¨Ìõ¼þµÄ»ØÌû£¬µã»÷ÕâÀï²é¿´±¾»°ÌâµÄËùÓлØÌû | ||
relonfbq½ð³æ (ÕýʽдÊÖ)
|
[ÇóÖú]
ÇëÎÊ ¡°0: copyout memcpyAsync FAILED: 74 (misaligned address)¡± ÎÊÌâÔõô½â¾ö
|
|
|
ÎÒ½ñÌìÔËÐÐFortran+cudaµÄ³ÌÐò£¬³öÏÖÒÔÉÏÌáʾ ¡°0: copyout memcpyAsync (host= , dev= , Size= ,Stream=) FAILED: 74 (misaligned address)¡±µÄ´íÎó£¬ÕâÊÇÔõô»ØÊ£¿ Õâ¸ö³ÌÐòÔÀ´ÔÚÎÒ×Ô¼ºµÄʵÑéÊÒ±àÒë¿ÉÒÔÔËÐУ¬ÏÖÔÚ»»Á˸öµØ·½£¬Ò²¿ÉÒÔ±àÒëÉú³É¿ÉÖ´ÐгÌÐò£¬µ«¾ÍÊdzöÏÖÕâÑùµÄÎÊÌ⣬²»ÖªµÀÔõô½â¾ö¡£ ÔÚ×Ô¼ºµÄʵÑéÊÒÓõÄÊÇcuda5.5±àÒëµÄ£¬pgfortran±àÒëµÄ ÏÖÔÚΨһ¸ÄÁËÓÃÁËcuda7.0±àÒëµÄ (û°ì·¨£¬Ö»ÓÐcuda7.0£¬Ò³Ï£Íû¸Ä³Écuda5.5)£¬ÓÐʲô½â¾ö°ìÂð£¿ÏÖÔÚÔÚÕâ¶ù»úÆ÷²»ÄÜËæÒâ°²×°±àÒëÆ÷£¬¼´ÊÇ¿ÉÒÔ£¬ÎÒÒ²²»ÖªÔõôŪ£¿´ó¼ÒÓöµ½¹ýÀàËÆÎÊÌâÂ¶¼ÊÇÔõô½â¾öµÄ£¿ |
» ²ÂÄãϲ»¶
336Çóµ÷¼Á
ÒѾÓÐ4È˻ظ´
ÇóÉúÎïѧµ÷¼Á
ÒѾÓÐ8È˻ظ´
08ÉúÎïÓëҽҩר˶³õÊÔ346ÕÒµ÷¼Á
ÒѾÓÐ6È˻ظ´
ÉúÎïѧ327£¬Çóµ÷¼Á
ÒѾÓÐ6È˻ظ´
0710ÉúÎïѧÇóµ÷¼Á
ÒѾÓÐ7È˻ظ´
385·Ö ÉúÎïѧ£¨071000£©Çóµ÷¼Á
ÒѾÓÐ4È˻ظ´
²ÄÁÏ¿¼Ñе÷¼Á
ÒѾÓÐ4È˻ظ´
366Çóµ÷¼ÁÒ»Ö¾Ô¸¶«±±´óѧ
ÒѾÓÐ3È˻ظ´
08¹¤¿Æ275·ÖÇóµ÷¼Á
ÒѾÓÐ13È˻ظ´
²ÄÁϹ¤³Ì322·Ö
ÒѾÓÐ6È˻ظ´
relonfbq
½ð³æ (ÕýʽдÊÖ)
- Ó¦Öú: 1 (Ó×¶ùÔ°)
- ½ð±Ò: 1294.3
- É¢½ð: 1140
- ºì»¨: 2
- Ìû×Ó: 301
- ÔÚÏß: 99.7Сʱ
- ³æºÅ: 744333
- ×¢²á: 2009-04-09
- רҵ: ½ðÊô²ÄÁϵÄ΢¹Û½á¹¹
4Â¥2016-03-31 18:54:29
relonfbq
½ð³æ (ÕýʽдÊÖ)
- Ó¦Öú: 1 (Ó×¶ùÔ°)
- ½ð±Ò: 1294.3
- É¢½ð: 1140
- ºì»¨: 2
- Ìû×Ó: 301
- ÔÚÏß: 99.7Сʱ
- ³æºÅ: 744333
- ×¢²á: 2009-04-09
- רҵ: ½ðÊô²ÄÁϵÄ΢¹Û½á¹¹
|
Õâ¸ö´íÎóÕâô¾ÃÁË£¬»¹ÊÇûÓнâ¾ö£¬ ========= Program hit cudaErrorInvalidValue (error 11) due to "invalid argument" on CUDA API call to cudaMemcpyAsync. ========= Saved host backtrace up to driver entry point at error ========= Host Frame:/usr/lib64/nvidia/libcuda.so.1 [0x2f31b3] ========= Host Frame:/opt/pgi/linux86-64/2015/cuda/7.5/lib64/libcudart.so.7.5 (cudaMemcpyAsync + 0x1b3) [0x435d3] ========= Host Frame:/opt/pgi/linux86-64/15.10/lib/libcudafor.so (pgf90_dev_copyout_async + 0x38) [0x1a7b0] ========= Host Frame:/opt/pgi/linux86-64/15.10/lib/libcudafor.so (cudamemcpyasync3r8out_ + 0x1c) [0x2346c] ========= Host Frame:./fasMD.exe [0x13de1f] ========= Host Frame:./fasMD.exe [0x13b8f6] ========= Host Frame:./fasMD.exe [0x12a62f] ========= Host Frame:./fasMD.exe [0x51af] ========= Host Frame:./fasMD.exe [0x7a2e] ========= Host Frame:./fasMD.exe [0x3b34] ========= Host Frame:/lib64/libc.so.6 (__libc_start_main + 0xfd) [0x1ed5d] ========= Host Frame:./fasMD.exe [0x3a29] ========= ========= ERROR SUMMARY: 1 error |
2Â¥2016-03-30 21:42:29
relonfbq
½ð³æ (ÕýʽдÊÖ)
- Ó¦Öú: 1 (Ó×¶ùÔ°)
- ½ð±Ò: 1294.3
- É¢½ð: 1140
- ºì»¨: 2
- Ìû×Ó: 301
- ÔÚÏß: 99.7Сʱ
- ³æºÅ: 744333
- ×¢²á: 2009-04-09
- רҵ: ½ðÊô²ÄÁϵÄ΢¹Û½á¹¹
3Â¥2016-03-31 17:43:55
relonfbq
½ð³æ (ÕýʽдÊÖ)
- Ó¦Öú: 1 (Ó×¶ùÔ°)
- ½ð±Ò: 1294.3
- É¢½ð: 1140
- ºì»¨: 2
- Ìû×Ó: 301
- ÔÚÏß: 99.7Сʱ
- ³æºÅ: 744333
- ×¢²á: 2009-04-09
- רҵ: ½ðÊô²ÄÁϵÄ΢¹Û½á¹¹
|
ÔÙ°ÑÎÊÌâϵͳ˵һ±é£¬¸Ã´íÎóµ÷ÊÔ·¢ÏÖ£¬³ÌÐòÖÕÖ¹ÓÚÓï¾ä£ºERR = cudaMemcpyAsync(DEN,m_DEN,dm_NPRT) DEN, m_DEN,¶¼ÓÐÉùÃ÷ºÍ¶¨ÒåÄڴ棬·Ö±ðÊÇdeviceºÍhostÄÚ´æ¡£cudaMemcpyAsyncÕâ¸öÓï¾äÔÚ¸ÃÐÐ֮ǰҲ³öÏÖ¹ý£¬Ç°ÃæµÄҲͨ¹ýµÄ£¬²»ÖªÎªÊ²Ã´¡£ ÁíÍâÕâ¸ö³ÌÐò¿ÉÒÔÔÚÎÒÔÀ´µÄ»úÆ÷(Cuda5.5/K20)ÉÏÔËÐУ¬×î½ü»»Á˸öµØ·½¸É»î£¬»»ÁĘ̈»úÆ÷(CUDA7.5/GK110B)£¬¾Í³öÏÖÕâ¸öÎÊÌâÁË¡£ --------------------------------------------------------------------------------------------------------------------------------- È»ºóÎÒ¾ÍÀûÓÃcuda-memcheck ·¢ÏÖÁËÒÔÏ´íÎóÌáʾ£º ------- ...... ========= Invalid __shared__ write of size 16 ========= at 0x00000250 in nbl_cal_neighborelist_devkernel2c_ ========= by thread (0,0,0) in block (14,0,0) ========= Address 0x00000004 is misaligned ========= Saved host backtrace up to driver entry point at kernel launch time ========= Host Frame:/usr/lib64/nvidia/libcuda.so.1 (cuLaunchKernel + 0x2cd) [0x15865d] ========= Host Frame:/opt/pgi/linux86-64/2015/cuda/7.5/lib64/libcudart.so.7.5 [0x146ad] ========= Host Frame:/opt/pgi/linux86-64/2015/cuda/7.5/lib64/libcudart.so.7.5 (cudaLaunch + 0x143) [0x2ece3] ========= Host Frame:./fasMD.exe [0xe8260] ========= Host Frame:./fasMD.exe [0xe7946] ========= Host Frame:./fasMD.exe [0xe3315] ========= Host Frame:./fasMD.exe [0x5002] ========= Host Frame:./fasMD.exe [0x77fe] ========= Host Frame:./fasMD.exe [0x3b34] ========= Host Frame:/lib64/libc.so.6 (__libc_start_main + 0xfd) [0x1ed5d] ========= Host Frame:./fasMD.exe [0x3a29] ========= Checking A003 0: copyout MemcpyAsync (host=0x203707e00, dev=0x704b27e00, size=32000, stream=0) FAILED: 4(unspecified launch failure) ========= ERROR SUMMARY: 168 errors [xxxx@oums-sfgpu MD]$ cuda-memcheck ./fasMD.exe ipa.thfu 0 1 ------- ²éµ½ÕâµÄʱºò£¬ÓиöÅóÓÑ˵¿ÉÄܲ»ÊÇcudaMemcpyAsyncµÄÎÊÌ⣬ºÃÏñÓеÀÀí£¬ÎÒÈ¥²éÁËÒ»ÏÂcal_neighborelist_devkernel2cÕâ¸ökernel×Ó³ÌÐò¡£×¢Ê͵ô£¬³ÌÐò¿ÉÒÔÔËÐС£ µ«ÊÇÎÒ×Ðϸ¼ì²éÁËÕâ¸ökernel£¬Ò²²»ÄÜÕÒ³öÎÊÌâµÄËùÔÚ¡£ ÎÒͨ¹ýÆÁ±ÎЩµØ·½£¬»òÕßÌí¼ÓЩdemoÓï¾ä¡£·¢ÏÖ¿ÉÄÜÊÇÀïÃæµÄshared±äÁ¿ÓÐÎÊÌâ¡£µ«ÊÇÎÒ»¹ÊÇÕÒ²»µ½ÎÊÌâµÄËùÔÚ¡£ ÏÂÃæÎÒÌù³öÕâ¸ökernel£¬Ï£Íû¸ßÊÖ×÷´ð£¬Ð»Ð»¡£ -------------------------------------------------------------------------------------------------------------- use constant implicit none ! !--- DUMMY VARIABLES integer, value::NBPC,IP0, NC,NCX,NCY,NCZ, PDX, PDY, PDZ, NPART, CFROM, CTO, mxNAPDEV, mxKVOIS, IA1th0 real(KINDDF), value::cra011,cra021,cra031,cra012,cra022,cra032,cra013,cra023,cra033 real(KINDDF), device::XP(3,NPART) integer, device::ITYP(NPART), NAC(NC), IA1th(NC) integer,device::KVOIS(mxNAPDEV) integer,device::INDI(mxKVOIS,mxNAPDEV) integer(1),device::mvl(mxKVOIS,mxNAPDEV) !--- Local variables !nonshared by threads real(KINDSF): OS(3), SEP(3)integer::IB, IB0, IT, IA, IA0,IA00, JA, NN, I, J, K, ITY !variables share by all thread integer,shared::NB, IC, IS0, STARTA, NCXY, NCXYZ,IX0, IY0, IZ0,IC0, NACC0 integer,shared::NS,NACC, IAC, IACE, FROM, TO integer, shared, dimension(mp_NNC)::CID, IX,IY, IZ, OUT integer(1), shared, dimension(mp_NNC)::I_mvl real(KINDSF), shared, dimension(3,mp_NNC)::CXYZ real(KINDSF), shared, dimension(3,mp_BLOCKSIZE)::SPOS real(KINDSF), shared::RC2(mp_MXGROUP,mp_MXGROUP) integer, shared, dimension(mp_BLOCKSIZE)::JTY IB = (blockidx%y-1) * griddim%x + blockidx%x-1 NB = blockdim%x*blockdim%y IB0 = IB/NBPC IP0 = (IB-IB0*NBPC)*NB IB0 = IB0 + CFROM-1 if(IB0 .GE. CTO) return IT = (threadidx%y-1)*blockdim%x + threadidx%x if(IT .EQ. 1) then RC2(1:mp_MXGROUP,1:mp_MXGROUP) = dcm_RU2(1:mp_MXGROUP,1:mp_MXGROUP) NCXY = NCX*NCY NCXYZ = NCXY*NCZ IS0 = IB0/NCXYZ IC = IB0-IS0*NCXYZ IZ0 = IC/NCXY IY0 = (IC-IZ0*NCXY)/NCX IX0 = IC-IZ0*NCXY-IY0*NCX IZ0 = IZ0 + 1 IY0 = IY0 + 1 IX0 = IX0 + 1 IC = IB0 + 1 STARTA = IA1th0 ! IA1th(CFROM) NACC0 = NAC(IC) end if call syncthreads() if(NACC0 .LE. 0) return if(IT .LE. mp_NNC) then OUT(IT)= 0 IZ(IT) = IZ0+mp_NIZ(IT) IY(IT) = IY0+mp_NIY(IT) IX(IT) = IX0+mp_NIX(IT) I_mvl(IT) = 0 CXYZ(1:3,IT) = 0.0d0 If(PDX .AND. IT.GT.1) Then IF( IX(IT).GT.NCX )THEN IX(IT) = 1 CXYZ(1,IT) = CXYZ(1,IT) + cra011 CXYZ(2,IT) = CXYZ(2,IT) + cra021 CXYZ(3,IT) = CXYZ(3,IT) + cra031 I_mvl(IT) = ior( I_mvl(IT) , 1) ELSE IF (IX(IT).LT.1) THEN IX(IT) = NCX CXYZ(1,IT) = CXYZ(1,IT) - cra011 CXYZ(2,IT) = CXYZ(2,IT) - cra021 CXYZ(3,IT) = CXYZ(3,IT) - cra031 I_mvl(IT) = ior( I_mvl(IT) , 2) ENDIF End If If(PDY .AND. IT.GT.1) Then IF( IY(IT).GT.NCY )THEN IY(IT) = 1 CXYZ(1,IT) = CXYZ(1,IT) + cra012 CXYZ(2,IT) = CXYZ(2,IT) + cra022 CXYZ(3,IT) = CXYZ(3,IT) + cra032 I_mvl(IT) = ior( I_mvl(IT) , 4) ELSE IF (IY(IT).LT.1) THEN IY(IT) = NCY CXYZ(1,IT) = CXYZ(1,IT) - cra012 CXYZ(2,IT) = CXYZ(2,IT) - cra022 CXYZ(3,IT) = CXYZ(3,IT) - cra032 I_mvl(IT) = ior( I_mvl(IT) , 8) ENDIF End If If(PDZ .AND. IT.GT.1) Then IF( IZ(IT).GT.NCZ )THEN IZ(IT) = 1 CXYZ(1,IT) = CXYZ(1,IT) + cra013 CXYZ(2,IT) = CXYZ(2,IT) + cra023 CXYZ(3,IT) = CXYZ(3,IT) + cra033 I_mvl(IT) = ior( I_mvl(IT) , 16) ELSE IF (IZ(IT).LT.1) THEN IZ(IT) = NCZ CXYZ(1,IT) = CXYZ(1,IT) - cra013 CXYZ(2,IT) = CXYZ(2,IT) - cra023 CXYZ(3,IT) = CXYZ(3,IT) - cra033 I_mvl(IT) = ior( I_mvl(IT) , 32) ENDIF End If IF( IX(IT) .GT. NCX .OR. IX(IT) .LT. 1) OUT(IT) = 1 IF( IY(IT) .GT. NCY .OR. IY(IT) .LT. 1) OUT(IT) = 1 IF( IZ(IT) .GT. NCZ .OR. IZ(IT) .LT. 1) OUT(IT) = 1 IF(OUT(IT) .EQ. 0) then CID(IT) = NCXY*(IZ(IT)-1)+NCX*(IY(IT)-1)+IX(IT)+IS0*NCXYZ ELSE CID(IT) = 0 END IF END IF call syncthreads() IA = (IT-1) + IA1th(IC) + IP0 IA0 = IA - STARTA + 1 IA00 = IA - IA1th(IC)+1 if(IA00.LE.NACC0) then POS(1:3)=XP(1:3,IA) ITY = ITYP(IA) NN = 0 endif K=1 NACC = NAC(CID(K)) NS = (NACC-1)/NB+1 IAC = IA1th(CID(K)) IACE = IAC + NACC -1 DO J=1, NS, 1 FROM = min((J-1)*NB+IAC,IACE) TO = min(FROM+NB-1, IACE) if(IT+FROM-1<=IACE)then SPOS(1:3,IT) = XP(1:3, IT+FROM-1) + CXYZ(1:3,K) JTY(IT) = ITYP(IT+FROM-1) end if call syncthreads() IF(IA00.LE.NACC0) then DO I=FROM, TO JA = I-FROM+1 SEP(1:3) = POS(1:3) - SPOS(1:3,JA) if( SEP(1)*SEP(1)+SEP(2)*SEP(2)+SEP(3)*SEP(3) .LE. RC2(ITY,JTY(JA)) ) then if(I.NE.IA) then if(NN<mxKVOIS) then NN = NN + 1 INDI(NN,IA0) = I mvl(NN,IA0) = I_mvl(K) end if end if end if END DO END IF call syncthreads() END DO DO K=2, mp_NNC IF(OUT(K)) cycle NACC = NAC(CID(K)) NS = min((NACC-1)/NB+1, NACC) IAC = IA1th(CID(K)) IACE = IAC + NACC -1 call syncthreads() DO J=1, NS FROM = min((J-1)*NB+IAC,IACE) TO = min(FROM+NB-1, IACE) if(IT+FROM-1<=IACE)then SPOS(1:3,IT) = XP(1:3, IT+FROM-1) + CXYZ(1:3,K) JTY(IT) = ITYP(IT+FROM-1) endif call syncthreads() IF(IA00.LE.NACC0) then DO I=FROM, TO JA = I-FROM+1 SEP(1:3) = POS(1:3) - SPOS(1:3,JA) if( SEP(1)*SEP(1)+SEP(2)*SEP(2)+SEP(3)*SEP(3) .LE. RC2(ITY,JTY(JA)) ) then if(NN<mxKVOIS) then NN = NN + 1 INDI(NN, IA0) = I mvl(NN, IA0) = I_mvl(K) end if end if END DO END IF call syncthreads() END DO END DO IF(IA00 .LE. NACC0) then KVOIS(IA0) = NN END IF return |
5Â¥2016-04-03 01:01:17














»Ø¸´´ËÂ¥
OS(3), SEP(3)