Znn3bq.jpeg
²é¿´: 1259  |  »Ø¸´: 12
¡¾½±Àø¡¿ ±¾Ìû±»ÆÀ¼Û11´Î£¬×÷ÕßpkusiyuanÔö¼Ó½ð±Ò 8.6 ¸ö

pkusiyuan

Òø³æ (ÕýʽдÊÖ)


[×ÊÔ´] 2010Programming.Massively.Parallel.Processors

Contents
Preface ......................................................................................................................xi
Acknowledgments ................................................................................................ xvii
Dedication...............................................................................................................xix
CHAPTER 1 INTRODUCTION................................................................................1
1.1 GPUs as Parallel Computers ..........................................................2
1.2 Architecture of a Modern GPU......................................................8
1.3 Why More Speed or Parallelism? ................................................10
1.4 Parallel Programming Languages and Models............................13
1.5 Overarching Goals ........................................................................15
1.6 Organization of the Book.............................................................16
CHAPTER 2 HISTORY OF GPU COMPUTING .....................................................21
2.1 Evolution of Graphics Pipelines ..................................................21
2.1.1 The Era of Fixed-Function Graphics Pipelines..................22
2.1.2 Evolution of Programmable Real-Time Graphics .............26
2.1.3 Unified Graphics and Computing Processors ....................29
2.1.4 GPGPU: An Intermediate Step...........................................31
2.2 GPU Computing ...........................................................................32
2.2.1 Scalable GPUs.....................................................................33
2.2.2 Recent Developments..........................................................34
2.3 Future Trends................................................................................34
CHAPTER 3 INTRODUCTION TO CUDA..............................................................39
3.1 Data Parallelism............................................................................39
3.2 CUDA Program Structure ............................................................41
3.3 A Matrix¨CMatrix Multiplication Example...................................42
3.4 Device Memories and Data Transfer...........................................46
3.5 Kernel Functions and Threading..................................................51
3.6 Summary.......................................................................................56
3.6.1 Function declarations ..........................................................56
3.6.2 Kernel launch ......................................................................56
3.6.3 Predefined variables ............................................................56
3.6.4 Runtime API........................................................................57
CHAPTER 4 CUDA THREADS.............................................................................59
4.1 CUDA Thread Organization ........................................................59
4.2 Using blockIdx and threadIdx ..........................................64
4.3 Synchronization and Transparent Scalability ..............................68
vii
4.4 Thread Assignment.......................................................................70
4.5 Thread Scheduling and Latency Tolerance .................................71
4.6 Summary .......................................................................................74
4.7 Exercises .......................................................................................74
CHAPTER 5 CUDA MEMORIES.......................................................................77
5.1 Importance of Memory Access Efficiency..................................78
5.2 CUDA Device Memory Types ....................................................79
5.3 A Strategy for Reducing Global Memory Traffic.......................83
5.4 Memory as a Limiting Factor to Parallelism ..............................90
5.5 Summary .......................................................................................92
5.6 Exercises .......................................................................................93
CHAPTER 6 PERFORMANCE CONSIDERATIONS................................................95
6.1 More on Thread Execution ..........................................................96
6.2 Global Memory Bandwidth........................................................103
6.3 Dynamic Partitioning of SM Resources ....................................111
6.4 Data Prefetching .........................................................................113
6.5 Instruction Mix ...........................................................................115
6.6 Thread Granularity .....................................................................116
6.7 Measured Performance and Summary .......................................118
6.8 Exercises .....................................................................................120
CHAPTER 7 FLOATING POINT CONSIDERATIONS ...........................................125
7.1 Floating-Point Format.................................................................126
7.1.1 Normalized Representation of M.....................................126
7.1.2 Excess Encoding of E.......................................................127
7.2 Representable Numbers ..............................................................129
7.3 Special Bit Patterns and Precision.............................................134
7.4 Arithmetic Accuracy and Rounding ..........................................135
7.5 Algorithm Considerations...........................................................136
7.6 Summary .....................................................................................138
7.7 Exercises .....................................................................................138
CHAPTER 8 APPLICATION CASE STUDY: ADVANCED MRI
RECONSTRUCTION.......................................................................141
8.1 Application Background.............................................................142
8.2 Iterative Reconstruction..............................................................144
8.3 Computing FHd...........................................................................148
Step 1. Determine the Kernel Parallelism Structure .................149
Step 2. Getting Around the Memory Bandwidth Limitation....156
viii Contents
Step 3. Using Hardware Trigonometry Functions ....................163
Step 4. Experimental Performance Tuning ...............................166
8.4 Final Evaluation..........................................................................167
8.5 Exercises .....................................................................................170
CHAPTER 9 APPLICATION CASE STUDY: MOLECULAR VISUALIZATION
AND ANALYSIS............................................................................173
9.1 Application Background.............................................................174
9.2 A Simple Kernel Implementation ..............................................176
9.3 Instruction Execution Efficiency................................................180
9.4 Memory Coalescing....................................................................182
9.5 Additional Performance Comparisons .......................................185
9.6 Using Multiple GPUs .................................................................187
9.7 Exercises .....................................................................................188
CHAPTER 10 PARALLEL PROGRAMMING AND COMPUTATIONAL
THINKING ....................................................................................191
10.1 Goals of Parallel Programming ...............................................192
10.2 Problem Decomposition ...........................................................193
10.3 Algorithm Selection .................................................................196
10.4 Computational Thinking...........................................................202
10.5 Exercises ...................................................................................204
CHAPTER 11 A BRIEF INTRODUCTION TO OPENCL ......................................205
11.1 Background...............................................................................205
11.2 Data Parallelism Model............................................................207
11.3 Device Architecture..................................................................209
11.4 Kernel Functions ......................................................................211
11.5 Device Management and Kernel Launch ................................212
11.6 Electrostatic Potential Map in OpenCL ..................................214
11.7 Summary...................................................................................219
11.8 Exercises ...................................................................................220
CHAPTER 12 CONCLUSION AND FUTURE OUTLOOK ........................................221
12.1 Goals Revisited.........................................................................221
12.2 Memory Architecture Evolution ..............................................223
12.2.1 Large Virtual and Physical Address Spaces ................223
12.2.2 Unified Device Memory Space ....................................224
12.2.3 Configurable Caching and Scratch Pad........................225
12.2.4 Enhanced Atomic Operations .......................................226
12.2.5 Enhanced Global Memory Access ...............................226
Contents ix
12.3 Kernel Execution Control Evolution .......................................227
12.3.1 Function Calls within Kernel Functions ......................227
12.3.2 Exception Handling in Kernel Functions.....................227
12.3.3 Simultaneous Execution of Multiple Kernels ..............228
12.3.4 Interruptible Kernels .....................................................228
12.4 Core Performance.....................................................................229
12.4.1 Double-Precision Speed ...............................................229
12.4.2 Better Control Flow Efficiency ....................................229
12.5 Programming Environment ......................................................230
12.6 A Bright Outlook......................................................................230
APPENDIX A MATRIX MULTIPLICATION HOST-ONLY VERSION
SOURCE CODE .............................................................................233
A.1 matrixmul.cu........................................................................233
A.2 matrixmul_gold.cpp .........................................................237
A.3 matrixmul.h..........................................................................238
A.4 assist.h .................................................................................239
A.5 Expected Output .........................................................................243
APPENDIX B GPU COMPUTE CAPABILITIES ....................................................245
B.1 GPU Compute Capability Tables...............................................245
B.2 Memory Coalescing Variations..................................................246
Index......................................................................................................... 251
»Ø¸´´ËÂ¥

» ±¾Ìû¸½¼þ×ÊÔ´Áбí

» ÊÕ¼±¾ÌûµÄÌÔÌûר¼­ÍƼö

Algorithm love physics µç×ÓÊé×ÊÁÏ CUDA
¿ÆÑÐÈí¼þ

» ²ÂÄãϲ»¶

ÒÑÔÄ   »Ø¸´´ËÂ¥   ¹Ø×¢TA ¸øTA·¢ÏûÏ¢ ËÍTAºì»¨ TAµÄ»ØÌû

dbeak

Òø³æ (СÓÐÃûÆø)


¸Ðл¥Ö÷·ÖÏí
8Â¥2015-06-25 18:37:13
ÒÑÔÄ   »Ø¸´´ËÂ¥   ¹Ø×¢TA ¸øTA·¢ÏûÏ¢ ËÍTAºì»¨ TAµÄ»ØÌû
¼òµ¥»Ø¸´
tonyhi2Â¥
2015-03-08 21:34   »Ø¸´  
ÈýÐÇºÃÆÀ  Ð»Ð»·ÖÏí [ ·¢×ÔСľ³æ¿Í»§¶Ë ]
FMStation3Â¥
2015-03-09 07:09   »Ø¸´  
ÎåÐÇºÃÆÀ  ¶¥Ò»Ï£¬¸Ðл·ÖÏí£¡
anmingkang4Â¥
2015-03-09 08:13   »Ø¸´  
ÎåÐÇºÃÆÀ  ¶¥Ò»Ï£¬¸Ðл·ÖÏí£¡
2015-03-09 08:52   »Ø¸´  
ÎåÐÇºÃÆÀ  ¶¥Ò»Ï£¬¸Ðл·ÖÏí£¡
truebelief6Â¥
2015-03-10 10:17   »Ø¸´  
ÎåÐÇºÃÆÀ  ¶¥Ò»Ï£¬¸Ðл·ÖÏí£¡
dbeak7Â¥
2015-06-25 18:24   »Ø¸´  
ÎåÐÇºÃÆÀ  ¶¥Ò»Ï£¬¸Ðл·ÖÏí£¡
2015-10-28 23:32   »Ø¸´  
ÎåÐÇºÃÆÀ  ¶¥Ò»Ï£¬¸Ðл·ÖÏí£¡
yinxzy10Â¥
2015-12-01 22:40   »Ø¸´  
ÎåÐÇºÃÆÀ  ¶¥Ò»Ï£¬¸Ðл·ÖÏí£¡
Nanobee11Â¥
2016-04-02 11:27   »Ø¸´  
ÎåÐÇºÃÆÀ  ¶¥Ò»Ï£¬¸Ðл·ÖÏí£¡
liu12333812Â¥
2016-10-17 11:34   »Ø¸´  
ÎåÐÇºÃÆÀ  ¶¥Ò»Ï£¬¸Ðл·ÖÏí£¡
2017-09-20 23:39   »Ø¸´  
ÎåÐÇºÃÆÀ  ¶¥Ò»Ï£¬¸Ðл·ÖÏí£¡
Ïà¹Ø°æ¿éÌø×ª ÎÒÒª¶©ÔÄÂ¥Ö÷ pkusiyuan µÄÖ÷Ìâ¸üÐÂ
¡î ÎÞÐǼ¶ ¡ï Ò»ÐǼ¶ ¡ï¡ï¡ï ÈýÐǼ¶ ¡ï¡ï¡ï¡ï¡ï ÎåÐǼ¶
×î¾ßÈËÆøÈÈÌûÍÆ¼ö [²é¿´È«²¿] ×÷Õß »Ø/¿´ ×îºó·¢±í
[¿¼ÑÐ] 331Çóµ÷¼Á +5 Íõ¹ú˧ 2026-04-11 5/250 2026-04-11 22:56 by Ϫ½§Á÷Ë®
[˶²©¼ÒÔ°] ÓÐûÓÐѧУ²ÄÁÏרҵÊÕ¿çµ÷(Ò»Ö¾Ô¸085410) +6 momo(Éϰ¶°æ) 2026-04-06 9/450 2026-04-11 22:38 by wj165256
[¿¼ÑÐ] µ÷¼ÁÇóÖú +6 ¹ûÈ»ÓÐÎÒ 2026-04-11 7/350 2026-04-11 16:22 by Ã÷Ô´ËʱÓÐ
[¿¼ÑÐ] 274Çóµ÷¼ÁÇóµ÷¼Á +11 Jachenbingoo 2026-04-06 14/700 2026-04-11 11:37 by ×ÏêØ×ÏÆå
[¿¼ÑÐ] Çóµ÷¼Á288 +6 ioodiiij 2026-04-10 8/400 2026-04-10 21:07 by zhouxiaoyu
[¿¼ÑÐ] 273Çóµ÷¼Á +51 ÂóС¶£µ± 2026-04-06 58/2900 2026-04-10 15:54 by jiajinhpu
[¿¼ÑÐ] Ò»Ö¾Ô¸211£¬»¯Ñ§Ñ§Ë¶£¬310·Ö£¬±¾¿ÆÖصãË«·Ç£¬Çóµ÷¼Á +27 ŬÁ¦·Ü¶·112 2026-04-07 30/1500 2026-04-10 15:06 by Kilig0317
[¿¼ÑÐ] Çóµ÷¼Á ²ÄÁÏÓ빤³Ì 324·Ö ר˶ +19 ôæôæÒ»ÊéÉú 2026-04-10 21/1050 2026-04-10 11:41 by wp06
[¿¼ÑÐ] ÉúÎïÓëÒ½Ò©µ÷¼Á +5 Ê®Æßsa 2026-04-05 5/250 2026-04-10 08:14 by kangsm
[¿¼ÑÐ] ¿¼Ñе÷¼Á-²ÄÁÏÀà-284 +28 Ïë»»ÊÖ»ú²»Ïë½âÊ 2026-04-08 28/1400 2026-04-09 20:08 by µ¹Êý321?
[¿¼ÑÐ] ²ÄÁÏר˶(0856) 339·ÖÇóµ÷¼Á +9 ¹þ¹þ¹þ¶ì¹þ¹þ¹þ 2026-04-09 10/500 2026-04-09 20:01 by Orcid
[¿¼ÑÐ] 085600²ÄÁÏÓ뻯¹¤301·ÖÇóµ÷¼ÁԺУ +33 ´ÌÍ´jk 2026-04-06 34/1700 2026-04-09 18:31 by hy861222
[¿¼ÑÐ] ±¾¿ÆÖ£ÖÝ´óѧ£¬Ò»Ö¾Ô¸»ª¶«Ê¦·¶´óѧ282Çóµ÷¼Á +23 Ðܸçxtk 2026-04-07 26/1300 2026-04-09 17:17 by 18446523
[¿¼ÑÐ] ¶þ´Îµ÷¼ÁÇóÀÏʦÊÕÁô +3 ЦЦԬ 2026-04-08 3/150 2026-04-08 23:50 by ×íÔÚ·çÀï
[¿¼ÑÐ] 344Çóµ÷¼Á +11 κ×Óper 2026-04-07 11/550 2026-04-07 23:01 by JourneyLucky
[¿¼ÑÐ] 081200-11408-367ѧ˶Çóµ÷¼Á +4 1_2_3111 2026-04-06 4/200 2026-04-07 08:13 by jp9609
[¿¼ÑÐ] 304Çóµ÷¼Á +4 luoye0105 2026-04-05 4/200 2026-04-06 21:05 by ľ×Ó¾ý1218
[¿¼ÑÐ] Ò»Ö¾Ô¸±±½»´ó²ÄÁϹ¤³Ì×Ü·Ö358Çóµ÷¼Á +10 cs0106 2026-04-05 12/600 2026-04-06 19:41 by Î޼ʵIJÝÔ­
[¿¼ÑÐ] Çóµ÷¼Áµ½²ÄÁÏ +5 ³Ì9915 2026-04-06 5/250 2026-04-06 15:21 by yulian1987
[¿¼ÑÐ] ¿¼Ñе÷¼ÁÉúѰÕÒµ¼Ê¦ +3 ¹ËÕ°¿¼Ñа¡ 2026-04-05 3/150 2026-04-05 18:18 by à£à£à£0119
ÐÅÏ¢Ìáʾ
ÇëÌî´¦ÀíÒâ¼û