24小时热门版块排行榜    

查看: 1245  |  回复: 12
【奖励】 本帖被评价11次,作者pkusiyuan增加金币 8.6

pkusiyuan

银虫 (正式写手)


[资源] 2010Programming.Massively.Parallel.Processors

Contents
Preface ......................................................................................................................xi
Acknowledgments ................................................................................................ xvii
Dedication...............................................................................................................xix
CHAPTER 1 INTRODUCTION................................................................................1
1.1 GPUs as Parallel Computers ..........................................................2
1.2 Architecture of a Modern GPU......................................................8
1.3 Why More Speed or Parallelism? ................................................10
1.4 Parallel Programming Languages and Models............................13
1.5 Overarching Goals ........................................................................15
1.6 Organization of the Book.............................................................16
CHAPTER 2 HISTORY OF GPU COMPUTING .....................................................21
2.1 Evolution of Graphics Pipelines ..................................................21
2.1.1 The Era of Fixed-Function Graphics Pipelines..................22
2.1.2 Evolution of Programmable Real-Time Graphics .............26
2.1.3 Unified Graphics and Computing Processors ....................29
2.1.4 GPGPU: An Intermediate Step...........................................31
2.2 GPU Computing ...........................................................................32
2.2.1 Scalable GPUs.....................................................................33
2.2.2 Recent Developments..........................................................34
2.3 Future Trends................................................................................34
CHAPTER 3 INTRODUCTION TO CUDA..............................................................39
3.1 Data Parallelism............................................................................39
3.2 CUDA Program Structure ............................................................41
3.3 A Matrix–Matrix Multiplication Example...................................42
3.4 Device Memories and Data Transfer...........................................46
3.5 Kernel Functions and Threading..................................................51
3.6 Summary.......................................................................................56
3.6.1 Function declarations ..........................................................56
3.6.2 Kernel launch ......................................................................56
3.6.3 Predefined variables ............................................................56
3.6.4 Runtime API........................................................................57
CHAPTER 4 CUDA THREADS.............................................................................59
4.1 CUDA Thread Organization ........................................................59
4.2 Using blockIdx and threadIdx ..........................................64
4.3 Synchronization and Transparent Scalability ..............................68
vii
4.4 Thread Assignment.......................................................................70
4.5 Thread Scheduling and Latency Tolerance .................................71
4.6 Summary .......................................................................................74
4.7 Exercises .......................................................................................74
CHAPTER 5 CUDA MEMORIES.......................................................................77
5.1 Importance of Memory Access Efficiency..................................78
5.2 CUDA Device Memory Types ....................................................79
5.3 A Strategy for Reducing Global Memory Traffic.......................83
5.4 Memory as a Limiting Factor to Parallelism ..............................90
5.5 Summary .......................................................................................92
5.6 Exercises .......................................................................................93
CHAPTER 6 PERFORMANCE CONSIDERATIONS................................................95
6.1 More on Thread Execution ..........................................................96
6.2 Global Memory Bandwidth........................................................103
6.3 Dynamic Partitioning of SM Resources ....................................111
6.4 Data Prefetching .........................................................................113
6.5 Instruction Mix ...........................................................................115
6.6 Thread Granularity .....................................................................116
6.7 Measured Performance and Summary .......................................118
6.8 Exercises .....................................................................................120
CHAPTER 7 FLOATING POINT CONSIDERATIONS ...........................................125
7.1 Floating-Point Format.................................................................126
7.1.1 Normalized Representation of M.....................................126
7.1.2 Excess Encoding of E.......................................................127
7.2 Representable Numbers ..............................................................129
7.3 Special Bit Patterns and Precision.............................................134
7.4 Arithmetic Accuracy and Rounding ..........................................135
7.5 Algorithm Considerations...........................................................136
7.6 Summary .....................................................................................138
7.7 Exercises .....................................................................................138
CHAPTER 8 APPLICATION CASE STUDY: ADVANCED MRI
RECONSTRUCTION.......................................................................141
8.1 Application Background.............................................................142
8.2 Iterative Reconstruction..............................................................144
8.3 Computing FHd...........................................................................148
Step 1. Determine the Kernel Parallelism Structure .................149
Step 2. Getting Around the Memory Bandwidth Limitation....156
viii Contents
Step 3. Using Hardware Trigonometry Functions ....................163
Step 4. Experimental Performance Tuning ...............................166
8.4 Final Evaluation..........................................................................167
8.5 Exercises .....................................................................................170
CHAPTER 9 APPLICATION CASE STUDY: MOLECULAR VISUALIZATION
AND ANALYSIS............................................................................173
9.1 Application Background.............................................................174
9.2 A Simple Kernel Implementation ..............................................176
9.3 Instruction Execution Efficiency................................................180
9.4 Memory Coalescing....................................................................182
9.5 Additional Performance Comparisons .......................................185
9.6 Using Multiple GPUs .................................................................187
9.7 Exercises .....................................................................................188
CHAPTER 10 PARALLEL PROGRAMMING AND COMPUTATIONAL
THINKING ....................................................................................191
10.1 Goals of Parallel Programming ...............................................192
10.2 Problem Decomposition ...........................................................193
10.3 Algorithm Selection .................................................................196
10.4 Computational Thinking...........................................................202
10.5 Exercises ...................................................................................204
CHAPTER 11 A BRIEF INTRODUCTION TO OPENCL ......................................205
11.1 Background...............................................................................205
11.2 Data Parallelism Model............................................................207
11.3 Device Architecture..................................................................209
11.4 Kernel Functions ......................................................................211
11.5 Device Management and Kernel Launch ................................212
11.6 Electrostatic Potential Map in OpenCL ..................................214
11.7 Summary...................................................................................219
11.8 Exercises ...................................................................................220
CHAPTER 12 CONCLUSION AND FUTURE OUTLOOK ........................................221
12.1 Goals Revisited.........................................................................221
12.2 Memory Architecture Evolution ..............................................223
12.2.1 Large Virtual and Physical Address Spaces ................223
12.2.2 Unified Device Memory Space ....................................224
12.2.3 Configurable Caching and Scratch Pad........................225
12.2.4 Enhanced Atomic Operations .......................................226
12.2.5 Enhanced Global Memory Access ...............................226
Contents ix
12.3 Kernel Execution Control Evolution .......................................227
12.3.1 Function Calls within Kernel Functions ......................227
12.3.2 Exception Handling in Kernel Functions.....................227
12.3.3 Simultaneous Execution of Multiple Kernels ..............228
12.3.4 Interruptible Kernels .....................................................228
12.4 Core Performance.....................................................................229
12.4.1 Double-Precision Speed ...............................................229
12.4.2 Better Control Flow Efficiency ....................................229
12.5 Programming Environment ......................................................230
12.6 A Bright Outlook......................................................................230
APPENDIX A MATRIX MULTIPLICATION HOST-ONLY VERSION
SOURCE CODE .............................................................................233
A.1 matrixmul.cu........................................................................233
A.2 matrixmul_gold.cpp .........................................................237
A.3 matrixmul.h..........................................................................238
A.4 assist.h .................................................................................239
A.5 Expected Output .........................................................................243
APPENDIX B GPU COMPUTE CAPABILITIES ....................................................245
B.1 GPU Compute Capability Tables...............................................245
B.2 Memory Coalescing Variations..................................................246
Index......................................................................................................... 251
回复此楼

» 本帖附件资源列表

» 收录本帖的淘帖专辑推荐

Algorithm love physics 电子书资料 CUDA
科研软件

» 猜你喜欢

已阅   回复此楼   关注TA 给TA发消息 送TA红花 TA的回帖

dbeak

银虫 (小有名气)


感谢楼主分享
8楼2015-06-25 18:37:13
已阅   回复此楼   关注TA 给TA发消息 送TA红花 TA的回帖
简单回复
tonyhi2楼
2015-03-08 21:34   回复  
三星好评  谢谢分享 [ 发自小木虫客户端 ]
FMStation3楼
2015-03-09 07:09   回复  
五星好评  顶一下,感谢分享!
2015-03-09 08:13   回复  
五星好评  顶一下,感谢分享!
2015-03-09 08:52   回复  
五星好评  顶一下,感谢分享!
2015-03-10 10:17   回复  
五星好评  顶一下,感谢分享!
dbeak7楼
2015-06-25 18:24   回复  
五星好评  顶一下,感谢分享!
2015-10-28 23:32   回复  
五星好评  顶一下,感谢分享!
yinxzy10楼
2015-12-01 22:40   回复  
五星好评  顶一下,感谢分享!
Nanobee11楼
2016-04-02 11:27   回复  
五星好评  顶一下,感谢分享!
liu12333812楼
2016-10-17 11:34   回复  
五星好评  顶一下,感谢分享!
2017-09-20 23:39   回复  
五星好评  顶一下,感谢分享!
相关版块跳转 我要订阅楼主 pkusiyuan 的主题更新
☆ 无星级 ★ 一星级 ★★★ 三星级 ★★★★★ 五星级
普通表情 高级回复 (可上传附件)
最具人气热帖推荐 [查看全部] 作者 回/看 最后发表
[考研] 303求调剂 +5 安忆灵 2026-03-22 6/300 2026-03-22 12:46 by 素颜倾城1988
[考研] 306求调剂 +4 来好运来来来 2026-03-22 4/200 2026-03-22 12:43 by 素颜倾城1988
[考研] 广西大学材料导师推荐 +3 夏夏夏小正 2026-03-17 5/250 2026-03-21 22:20 by 金昊ML
[考研] 求调剂 +5 十三加油 2026-03-21 5/250 2026-03-21 18:48 by 学员8dgXkO
[考研] 材料工程专硕 348分求调剂 +3 冬辞. 2026-03-17 5/250 2026-03-21 18:47 by 学员8dgXkO
[考研] 【考研调剂】化学专业 281分,一志愿四川大学,诚心求调剂 +11 吃吃吃才有意义 2026-03-19 11/550 2026-03-21 18:23 by 学员8dgXkO
[考研] 307求调剂 +3 余意卿 2026-03-18 3/150 2026-03-21 17:31 by ColorlessPI
[考研] 302求调剂 +12 呼呼呼。。。。 2026-03-17 12/600 2026-03-21 17:29 by ColorlessPI
[考研] 306求调剂 +4 chuanzhu川烛 2026-03-18 4/200 2026-03-21 08:25 by laoshidan
[考研] 265求调剂 +3 Jack?k?y 2026-03-17 3/150 2026-03-21 03:17 by JourneyLucky
[考研] 初始318分求调剂(有工作经验) +3 1911236844 2026-03-17 3/150 2026-03-21 02:33 by JourneyLucky
[考研] 化学求调剂 +4 临泽境llllll 2026-03-17 5/250 2026-03-21 02:23 by JourneyLucky
[考研] 288求调剂 +16 于海海海海 2026-03-19 16/800 2026-03-20 22:28 by JourneyLucky
[考研] 一志愿 西北大学 ,070300化学学硕,总分287,双非一本,求调剂。 +4 晨昏线与星海 2026-03-19 4/200 2026-03-20 22:15 by JourneyLucky
[考研] 求调剂,一志愿:南京航空航天大学大学 ,080500材料科学与工程学硕,总分289分 +4 @taotao 2026-03-19 4/200 2026-03-20 22:14 by JourneyLucky
[考研] 290求调剂 +7 ^O^乜 2026-03-19 7/350 2026-03-20 21:43 by JourneyLucky
[考研] 308求调剂 +4 是Lupa啊 2026-03-16 4/200 2026-03-17 17:12 by ruiyingmiao
[考研] 一志愿苏州大学材料工程(085601)专硕有科研经历三项国奖两个实用型专利一项省级立项 +6 大火山小火山 2026-03-16 8/400 2026-03-17 15:05 by 无懈可击111
[考博] 26申博 +4 八6八68 2026-03-16 4/200 2026-03-17 13:00 by 轻松不少随
[考研] 275求调剂 +4 太阳花天天开心 2026-03-16 4/200 2026-03-17 10:53 by 功夫疯狂
信息提示
请填处理意见