版块导航: 正在加载中...

登录注册

应《网络安全法》要求，自2017年10月1日起，未进行实名认证将不得使用互联网跟帖服务。为保障您的帐号能够正常使用，请尽快对帐号进行手机号验证，感谢您的理解与支持！

24小时热门版块排行榜

>论坛更新日志 (3910)
>虫友互识 (924)
>考研 (845)
>导师招生 (598)
>文献求助 (89)
>硕博家园 (60)
>考博 (38)
>博后之家 (35)
>休闲灌水 (31)
>基金申请 (27)
>论文投稿 (25)
>教师之家 (18)
>找工作 (16)
>绿色求助(高悬赏) (14)
>数学 (13)
>公派出国 (13)

返回列表

【奖励】本帖被评价11次，作者pkusiyuan增加金币 8.6 个

pkusiyuan

银虫 (正式写手)

应助: 0 (幼儿园)
金币: 15208.9
帖子: 773
在线: 130小时
虫号: 3451204

[资源] 2010Programming.Massively.Parallel.Processors

Contents
Preface ......................................................................................................................xi
Acknowledgments ................................................................................................ xvii
Dedication...............................................................................................................xix
CHAPTER 1 INTRODUCTION................................................................................1
1.1 GPUs as Parallel Computers ..........................................................2
1.2 Architecture of a Modern GPU......................................................8
1.3 Why More Speed or Parallelism? ................................................10
1.4 Parallel Programming Languages and Models............................13
1.5 Overarching Goals ........................................................................15
1.6 Organization of the Book.............................................................16
CHAPTER 2 HISTORY OF GPU COMPUTING .....................................................21
2.1 Evolution of Graphics Pipelines ..................................................21
2.1.1 The Era of Fixed-Function Graphics Pipelines..................22
2.1.2 Evolution of Programmable Real-Time Graphics .............26
2.1.3 Unified Graphics and Computing Processors ....................29
2.1.4 GPGPU: An Intermediate Step...........................................31
2.2 GPU Computing ...........................................................................32
2.2.1 Scalable GPUs.....................................................................33
2.2.2 Recent Developments..........................................................34
2.3 Future Trends................................................................................34
CHAPTER 3 INTRODUCTION TO CUDA..............................................................39
3.1 Data Parallelism............................................................................39
3.2 CUDA Program Structure ............................................................41
3.3 A Matrix–Matrix Multiplication Example...................................42
3.4 Device Memories and Data Transfer...........................................46
3.5 Kernel Functions and Threading..................................................51
3.6 Summary.......................................................................................56
3.6.1 Function declarations ..........................................................56
3.6.2 Kernel launch ......................................................................56
3.6.3 Predefined variables ............................................................56
3.6.4 Runtime API........................................................................57
CHAPTER 4 CUDA THREADS.............................................................................59
4.1 CUDA Thread Organization ........................................................59
4.2 Using blockIdx and threadIdx ..........................................64
4.3 Synchronization and Transparent Scalability ..............................68
vii
4.4 Thread Assignment.......................................................................70
4.5 Thread Scheduling and Latency Tolerance .................................71
4.6 Summary .......................................................................................74
4.7 Exercises .......................................................................................74
CHAPTER 5 CUDA MEMORIES.......................................................................77
5.1 Importance of Memory Access Efficiency..................................78
5.2 CUDA Device Memory Types ....................................................79
5.3 A Strategy for Reducing Global Memory Traffic.......................83
5.4 Memory as a Limiting Factor to Parallelism ..............................90
5.5 Summary .......................................................................................92
5.6 Exercises .......................................................................................93
CHAPTER 6 PERFORMANCE CONSIDERATIONS................................................95
6.1 More on Thread Execution ..........................................................96
6.2 Global Memory Bandwidth........................................................103
6.3 Dynamic Partitioning of SM Resources ....................................111
6.4 Data Prefetching .........................................................................113
6.5 Instruction Mix ...........................................................................115
6.6 Thread Granularity .....................................................................116
6.7 Measured Performance and Summary .......................................118
6.8 Exercises .....................................................................................120
CHAPTER 7 FLOATING POINT CONSIDERATIONS ...........................................125
7.1 Floating-Point Format.................................................................126
7.1.1 Normalized Representation of M.....................................126
7.1.2 Excess Encoding of E.......................................................127
7.2 Representable Numbers ..............................................................129
7.3 Special Bit Patterns and Precision.............................................134
7.4 Arithmetic Accuracy and Rounding ..........................................135
7.5 Algorithm Considerations...........................................................136
7.6 Summary .....................................................................................138
7.7 Exercises .....................................................................................138
CHAPTER 8 APPLICATION CASE STUDY: ADVANCED MRI
RECONSTRUCTION.......................................................................141
8.1 Application Background.............................................................142
8.2 Iterative Reconstruction..............................................................144
8.3 Computing FHd...........................................................................148
Step 1. Determine the Kernel Parallelism Structure .................149
Step 2. Getting Around the Memory Bandwidth Limitation....156
viii Contents
Step 3. Using Hardware Trigonometry Functions ....................163
Step 4. Experimental Performance Tuning ...............................166
8.4 Final Evaluation..........................................................................167
8.5 Exercises .....................................................................................170
CHAPTER 9 APPLICATION CASE STUDY: MOLECULAR VISUALIZATION
AND ANALYSIS............................................................................173
9.1 Application Background.............................................................174
9.2 A Simple Kernel Implementation ..............................................176
9.3 Instruction Execution Efficiency................................................180
9.4 Memory Coalescing....................................................................182
9.5 Additional Performance Comparisons .......................................185
9.6 Using Multiple GPUs .................................................................187
9.7 Exercises .....................................................................................188
CHAPTER 10 PARALLEL PROGRAMMING AND COMPUTATIONAL
THINKING ....................................................................................191
10.1 Goals of Parallel Programming ...............................................192
10.2 Problem Decomposition ...........................................................193
10.3 Algorithm Selection .................................................................196
10.4 Computational Thinking...........................................................202
10.5 Exercises ...................................................................................204
CHAPTER 11 A BRIEF INTRODUCTION TO OPENCL ......................................205
11.1 Background...............................................................................205
11.2 Data Parallelism Model............................................................207
11.3 Device Architecture..................................................................209
11.4 Kernel Functions ......................................................................211
11.5 Device Management and Kernel Launch ................................212
11.6 Electrostatic Potential Map in OpenCL ..................................214
11.7 Summary...................................................................................219
11.8 Exercises ...................................................................................220
CHAPTER 12 CONCLUSION AND FUTURE OUTLOOK ........................................221
12.1 Goals Revisited.........................................................................221
12.2 Memory Architecture Evolution ..............................................223
12.2.1 Large Virtual and Physical Address Spaces ................223
12.2.2 Unified Device Memory Space ....................................224
12.2.3 Configurable Caching and Scratch Pad........................225
12.2.4 Enhanced Atomic Operations .......................................226
12.2.5 Enhanced Global Memory Access ...............................226
Contents ix
12.3 Kernel Execution Control Evolution .......................................227
12.3.1 Function Calls within Kernel Functions ......................227
12.3.2 Exception Handling in Kernel Functions.....................227
12.3.3 Simultaneous Execution of Multiple Kernels ..............228
12.3.4 Interruptible Kernels .....................................................228
12.4 Core Performance.....................................................................229
12.4.1 Double-Precision Speed ...............................................229
12.4.2 Better Control Flow Efficiency ....................................229
12.5 Programming Environment ......................................................230
12.6 A Bright Outlook......................................................................230
APPENDIX A MATRIX MULTIPLICATION HOST-ONLY VERSION
SOURCE CODE .............................................................................233
A.1 matrixmul.cu........................................................................233
A.2 matrixmul_gold.cpp .........................................................237
A.3 matrixmul.h..........................................................................238
A.4 assist.h .................................................................................239
A.5 Expected Output .........................................................................243
APPENDIX B GPU COMPUTE CAPABILITIES ....................................................245
B.1 GPU Compute Capability Tables...............................................245
B.2 Memory Coalescing Variations..................................................246
Index......................................................................................................... 251

回复此楼

» 本帖附件资源列表

欢迎监督和反馈：小木虫仅提供交流平台，不对该内容负责。
本内容由用户自主发布，如果其内容涉及到知识产权问题，其责任在于用户本人，如对版权有异议，请联系邮箱：xiaomuchong@tal.com
附件 1 : 大规模并行处理器程序设计.(Programming.Massively.Parallel.Processors.A.Hands-on.Approach),.Kirk,.Hwu,.文字版.pdf

2015-03-08 20:58:14, 4.74 M

» 收录本帖的淘帖专辑推荐

Algorithm	love physics	电子书资料	CUDA
科研软件

» 猜你喜欢

电子信息270求调剂已经有18人回复
食品与营养（0955）271求调剂已经有5人回复
0831生医工第一轮调剂失败求助已经有14人回复
279学硕食品专业求调剂院校已经有13人回复
293求调剂已经有11人回复
322求调剂已经有13人回复
272分材料子求调剂已经有50人回复
295分求调剂已经有13人回复
339求调剂已经有9人回复
一志愿哈工大 085600 277 12材科基求调剂已经有28人回复

1楼 2015-03-08 20:58:17

已阅回复此楼关注TA 给TA发消息送TA红花 TA的回帖

dbeak

银虫 (小有名气)

应助: 1 (幼儿园)
金币: 251.3
帖子: 129
在线: 28.6小时
虫号: 2010954

感谢楼主分享

回复此楼

8楼2015-06-25 18:37:13

已阅回复此楼关注TA 给TA发消息送TA红花 TA的回帖

简单回复

tonyhi2楼

2015-03-08 21:34 回复

三星好评谢谢分享 [ 发自小木虫客户端 ]

FMStation3楼

2015-03-09 07:09 回复

五星好评顶一下，感谢分享！

anmingkang4楼

2015-03-09 08:13 回复

五星好评顶一下，感谢分享！

springcxliu5楼

2015-03-09 08:52 回复

五星好评顶一下，感谢分享！

truebelief6楼

2015-03-10 10:17 回复

五星好评顶一下，感谢分享！

dbeak7楼

2015-06-25 18:24 回复

五星好评顶一下，感谢分享！

wangkun76739楼

2015-10-28 23:32 回复

五星好评顶一下，感谢分享！

yinxzy10楼

2015-12-01 22:40 回复

五星好评顶一下，感谢分享！

Nanobee11楼

2016-04-02 11:27 回复

五星好评顶一下，感谢分享！

liu12333812楼

2016-10-17 11:34 回复

五星好评顶一下，感谢分享！

逍遥学生TT13楼

2017-09-20 23:39 回复

五星好评顶一下，感谢分享！

相关版块跳转我要订阅楼主 pkusiyuan 的主题更新

返回列表

☆ 无星级 ★ 一星级 ★★★ 三星级 ★★★★★ 五星级

普通表情龙兔虎猫高级回复 (可上传附件)

最具人气热帖推荐 [查看全部]		作者	回/看	最后发表

[考研] 293求调剂 +11	我爱高数高数爱� 2026-04-12	11/550	2026-04-12 15:57 by LHGeng
[考研] 药学305求调剂 +8	玛卡巴卡boom 2026-04-10	8/400	2026-04-12 00:07 by zhouwenxian
[考研] 086003调剂求助 +21	苏弋万 2026-04-09	22/1100	2026-04-11 20:25 by dongdian1
[考研] 本人女孩 +7	吼吼， 2026-04-10	9/450	2026-04-11 14:45 by ACS Nano——
[考研] 化学308分求调剂 +22	你好明天你好 2026-04-07	24/1200	2026-04-11 11:14 by ChemPharm
[考研] 298求调剂 +5	残荷新柳 2026-04-07	5/250	2026-04-11 11:02 by 紫曦紫棋
[考研] 农业管理302分求调剂 +3	xuening1 2026-04-10	3/150	2026-04-11 10:18 by zhq0425
[考研] 0854调剂 +8	950824he@ 2026-04-09	8/400	2026-04-11 10:11 by zhq0425
[考研] 材料与化工调剂 +12	否极泰来2026 2026-04-10	13/650	2026-04-11 00:28 by wangjihu
[考研] 282，电气工程专业，求调剂，不挑专业 +9	jggshjkkm 2026-04-10	9/450	2026-04-10 14:55 by 逆水乘风
[考研] 青岛科技大学材料学院，环境学院调剂补录4月10日以前都可以 +3	1青科大。 2026-04-09	5/250	2026-04-10 09:58 by 翩翩一书生
[考研] 08600生物与医药-327 +10	18755400796 2026-04-05	10/500	2026-04-10 08:14 by kangsm
[考研] 初试分332，一志愿报考西北工业大学， +11	故人?? 2026-04-09	11/550	2026-04-09 21:54 by JineShine
[考研] 本科211 工科085400 280分求调剂可跨专业 +3	LZH（等待调剂中 2026-04-09	3/150	2026-04-09 21:29 by wutongshun
[考研] 0703化学求调剂 +21	不知名的小卅 2026-04-08	21/1050	2026-04-09 18:55 by l_paradox
[考研] 274求调剂 +5	山阿蔓 2026-04-07	5/250	2026-04-09 15:28 by 18828373951
[考研] 化学工程与技术专业一志愿哈工程 291分B区国家级大创负责人有一作论文 +13	Emmy~ 2026-04-09	13/650	2026-04-09 14:47 by only周
[考研] 生物学328分求调剂 +9	闪电kkl 2026-04-08	10/500	2026-04-08 21:42 by liuhuiying09
[考研] 计算机11408 287 求调剂 +3	LiLe5 2026-04-07	3/150	2026-04-07 23:15 by shanqishi
[考研] 318求调剂 +5	李青山山山 2026-04-07	5/250	2026-04-07 18:24 by 蓝云思雨