24小时热门版块排行榜    

查看: 1045  |  回复: 3

hakuna

木虫 (知名作家)

[交流] 最新出版的一本关于GPU上电子结构计算的书 已有3人参与

Electronic Structure Calculations on Graphics Processing Units: From Quantum Chemistry to Condensed Matter Physics
Ross C. Walker (Editor), Andreas W. Goetz (Editor)
ISBN: 978-1-118-66178-9
376 pages
February 2016

Electronic Structure Calculations on Graphics Processing Units: From Quantum Chemistry to Condensed Matter Physics provides an overview of computing on graphics processing units (GPUs), a brief introduction to GPU programming, and the latest examples of code developments and applications for the most widely used electronic structure methods.

The book covers all commonly used basis sets including localized Gaussian and Slater type basis functions, plane waves, wavelets and real-space grid-based approaches.
The chapters expose details on the calculation of two-electron integrals, exchange-correlation quadrature, Fock matrix formation, solution of the self-consistent field equations, calculation of nuclear gradients to obtain forces, and methods to treat excited states within DFT. Other chapters focus on semiempirical and correlated wave function methods including density fitted second order Møller-Plesset perturbation theory and both iterative and perturbative single- and multireference coupled cluster methods.

Electronic Structure Calculations on Graphics Processing Units: From Quantum Chemistry to Condensed Matter Physics presents an accessible overview of the field for graduate students and senior researchers of theoretical and computational chemistry, condensed matter physics and materials science, as well as software developers looking for an entry point into the realm of GPU and hybrid GPU/CPU programming for electronic structure calculations.

Table of Contents
List of Contributors xiii
Preface xvii
Acknowledgments xix
Glossary xxi
Abbreviations xxv

1. Why Graphics Processing Units 1
Perri Needham, Andreas W. Götz and Ross C. Walker
1.1 A Historical Perspective of Parallel Computing 1
1.2 The Rise of the GPU 5
1.3 Parallel Computing on Central Processing Units 7
1.3.1 Parallel Programming Memory Models 7
1.3.2 Parallel Programming Languages 8
1.3.3 Types of Parallelism 9
1.3.4 Parallel Performance Considerations 10
1.4 Parallel Computing on Graphics Processing Units 12
1.4.1 GPU Memory Model 12
1.4.2 GPU APIs 12
1.4.3 Suitable Code for GPU Acceleration 13
1.4.4 Scalability, Performance, and Cost Effectiveness 14
1.5 GPU-Accelerated Applications 15
1.5.1 Amber 15
1.5.2 Adobe Premier Pro CC 18
References 19

2. GPUs: Hardware to Software 23
Perri Needham, Andreas W. Götz and Ross C. Walker
2.1 Basic GPU Terminology 24
2.2 Architecture of GPUs 24
2.2.1 General Nvidia Hardware Features 25
2.2.2 Warp Scheduling 25
2.2.3 Evolution of Nvidia Hardware through the Generations 26
2.3 CUDA Programming Model 26
2.3.1 Kernels 27
2.3.2 Thread Hierarchy 27
2.3.3 Memory Hierarchy 29
2.4 Programming and Optimization Concepts 30
2.4.1 Latency: Memory Access 30
2.4.2 Coalescing Device Memory Accesses 31
2.4.3 Shared Memory Bank Conflicts 31
2.4.4 Latency: Issuing Instructions to Warps 32
2.4.5 Occupancy 32
2.4.6 Synchronous and Asynchronous Execution 33
2.4.7 Stream Programming and Batching 33
2.5 Software Libraries for GPUs 34
2.6 Special Features of CUDA-Enabled GPUs 35
2.6.1 Hyper-Q 35
2.6.2 MPS 35
2.6.3 Unified Memory 35
2.6.4 NVLink 36
References 36

3. Overview of Electronic Structure Methods 39
Andreas W. Götz
3.1 Introduction 39
3.1.1 Computational Complexity 40
3.1.2 Application Fields, from Structures to Spectroscopy 41
3.1.3 Chapter Overview 41
3.2 Hartree–Fock Theory 42
3.2.1 Basis Set Representation 43
3.2.2 Two-Electron Repulsion Integrals 44
3.2.3 Diagonalization 45
3.3 Density Functional Theory 46
3.3.1 Kohn–Sham Theory 46
3.3.2 Exchange-Correlation Functionals 47
3.3.3 Exchange-Correlation Quadrature 49
3.4 Basis Sets 49
3.4.1 Slater-Type Functions 49
3.4.2 Gaussian-Type Functions 50
3.4.3 Plane Waves 51
3.4.4 Representations on a Numerical Grid 52
3.4.5 Auxiliary Basis Sets 52
3.5 Semiempirical Methods 53
3.5.1 Neglect of Diatomic Differential Overlap 53
3.5.2 Fock Matrix Elements 54
3.5.3 Two-Electron Repulsion Integrals 54
3.5.4 Energy and Core Repulsion 55
3.5.5 Models Beyond MNDO 56
3.6 Density Functional Tight Binding 56
3.7 Wave Function-Based Electron Correlation Methods 57
3.7.1 Møller–Plesset Perturbation Theory 59
3.7.2 Coupled Cluster Theory 59
Acknowledgments 60

References 61
4. Gaussian Basis Set Hartree–Fock, Density Functional Theory, and Beyond on GPUs 67
Nathan Luehr, Aaron Sisto and Todd J. Martínez
4.1 Quantum Chemistry Review 68
4.1.1 Self-Consistent Field Equations in Gaussian Basis Sets 68
4.1.2 Electron–Electron Repulsion Integral Evaluation 71
4.2 Hardware and CUDA Overview 72
4.3 GPU ERI Evaluation 73
4.3.1 One-Block-One-Contracted Integral 74
4.3.2 One-Thread-One-Contracted Integral 75
4.3.3 One-Thread-One-Primitive Integral 75
4.3.4 Comparison of Contracted ERI Schemes 76
4.3.5 Extensions to Higher Angular Momentum 77
4.4 Integral-Direct Fock Construction on GPUs 78
4.4.1 GPU J-Engine 79
4.4.2 GPU K-Engine 81
4.4.3 Exchange–Correlation Integration 85
4.5 Precision Considerations 88
4.6 Post-SCF Methods 91
4.7 Example Calculations 93
4.8 Conclusions and Outlook 97
References 98

5. GPU Acceleration for Density Functional Theory with Slater-Type Orbitals 101
Hans van Schoot and Lucas Visscher
5.1 Background 101
5.2 Theory and CPU Implementation 102
5.2.1 Numerical Quadrature of the Fock Matrix 102
5.2.2 CPU Code SCF Performance 103
5.3 GPU Implementation 105
5.3.1 Hardware and Software Requirements 105
5.3.2 GPU Kernel Code 106
5.3.3 Hybrid CPU/GPU Computing Scheme 108
5.3.4 Speed-Up Results for a Single-Point Calculation 110
5.3.5 Speed-Up Results for an Analytical Frequency Calculation 110
5.4 Conclusion 112
References 113

6. Wavelet-Based Density Functional Theory on Massively Parallel Hybrid Architectures 115
Luigi Genovese, Brice Videau, Damien Caliste, Jean-François Méhaut, Stefan Goedecker and Thierry Deutsch
6.1 Introductory Remarks on Wavelet Basis Sets for Density Functional Theory Implementations 115
6.2 Operators in Wavelet Basis Sets 117
6.2.1 Daubechies Wavelets Basis and Convolutions 117
6.2.2 The Kohn–Sham Formalism 119
6.2.3 Three-Dimensional Basis 120
6.2.4 The Kinetic Operator and the Local Potential 121
6.2.5 Poisson Solver 122
6.3 Parallelization 123
6.3.1 MPI Parallel Performance and Architecture Dependence 123
6.4 GPU Architecture 124
6.4.1 GPU Implementation Using the OpenCL Language 125
6.4.2 Implementation Details of the Convolution Kernel 126
6.4.3 Performance of the GPU Convolution Routines 128
6.4.4 Three-Dimensional Operators, Complete BigDFT Code 128
6.4.5 Other GPU Accelerations 132
6.5 Conclusions and Outlook 132
6.5.1 Evaluation of Performance Benefits for Complex Codes 132
References 133

7. Plane-Wave Density Functional Theory 135
Maxwell Hutchinson, Paul Fleurat-Lessard, Ani Anciaux-Sedrakian, Dusan Stosic, Jeroen Bédorf and Sarah Tariq
7.1 Introduction 135
7.2 Theoretical Background 136
7.2.1 Self-Consistent Field 136
7.2.2 Ultrasoft Pseudopotentials 138
7.2.3 Projector Augmented Wave (PAW) Method 138
7.2.4 Force and Stress 139
7.2.5 Iterative Diagonalization 140
7.3 Implementation 143
7.3.1 Transformations 143
7.3.2 Functionals 145
7.3.3 Diagonalization 145
7.3.4 Occupancies 147
7.3.5 Electron Density 147
7.3.6 Forces 147
7.4 Optimizations 148
7.4.1 GPU Optimization Techniques 148
7.4.2 Parallel Optimization Techniques (Off-Node) 150
7.4.3 Numerical Optimization Techniques 151
7.5 Performance Examples 151
7.5.1 Benchmark Settings 151
7.5.2 Self-Consistent Charge Density 154
7.5.3 Band Structure 156
7.5.4 AIMD 157
7.5.5 Structural Relaxation 158
7.6 Exact Exchange with Plane Waves 159
7.6.1 Implementation 160
7.6.2 Optimization 162
7.6.3 Performance/Examples 163
7.7 Summary and Outlook 165
Acknowledgments 165
References 165
Appendix A: Definitions and Conventions 168
Appendix B: Example Kernels 168

8. GPU-Accelerated Sparse Matrix–Matrix Multiplication for Linear Scaling Density Functional Theory 173
Ole Schütt, Peter Messmer, Jürg Hutter and Joost VandeVondele
8.1 Introduction 173
8.1.1 Linear Scaling Self-Consistent Field 173
8.1.2 DBCSR: A Sparse Matrix Library 177
8.2 Software Architecture for GPU-Acceleration 177
8.2.1 Cannon Layer 178
8.2.2 Multrec Layer 179
8.2.3 CSR Layer 179
8.2.4 Scheduler and Driver Layers 179
8.3 Maximizing Asynchronous Progress 180
8.3.1 CUDA Streams and Events 180
8.3.2 Double Buffered Cannon on Host and Device 181
8.4 Libcusmm: GPU Accelerated Small Matrix Multiplications 183
8.4.1 Small Matrix Multiplication Performance Model 183
8.4.2 Matrix-Product Algorithm Choice 183
8.4.3 GPU Implementation: Generic Algorithm 184
8.4.4 Auto-Tuning and Performance 186
8.5 Benchmarks and Conclusions 186
Acknowledgments 189
References 189

9. Grid-Based Projector-Augmented Wave Method 191
Samuli Hakala, Jussi Enkovaara, Ville Havu, Jun Yan, Lin Li, Chris O’Grady
and Risto M. Nieminen
9.1 Introduction 191
9.2 General Overview 193
9.2.1 Projector-Augmented Wave Method 193
9.2.2 Uniform Real-Space Grids 195
9.2.3 Multigrid Method 195
9.3 Using GPUs in Ground-State Calculations 196
9.3.1 Stencil Operations 198
9.3.2 Hybrid Level 3 BLAS Functions 198
9.3.3 Parallelization for Multiple GPUs 199
9.3.4 Results 200
9.4 Time-Dependent Density Functional Theory 202
9.4.1 GPU Implementation 202
9.4.2 Results 203
9.5 Random Phase Approximation for the Correlation Energy 203
9.5.1 GPU Implementation 204
9.5.2 Performance Analysis Techniques 205
9.5.3 Results 206
9.6 Summary and Outlook 207
Acknowledgments 208
References 208

10. Application of Graphics Processing Units to Accelerate Real-Space Density Functional Theory and Time-Dependent Density Functional Theory Calculations 211
Xavier Andrade and Alán Aspuru-Guzik
10.1 Introduction 212
10.2 The Real-Space Representation 213
10.3 Numerical Aspects of the Real-Space Approach 214
10.4 General GPU Optimization Strategy 216
10.5 Kohn–Sham Hamiltonian 217
10.6 Orthogonalization and Subspace Diagonalization 221
10.7 Exponentiation 222
10.8 The Hartree Potential 223
10.9 Other Operations 224
10.10 Numerical Performance 225
10.11 Conclusions 228
10.12 Computational Methods 228
Acknowledgments 229
References 229

11. Semiempirical Quantum Chemistry 239
Xin Wu, Axel Koslowski and Walter Thiel
11.1 Introduction 239
11.2 Overview of Semiempirical Methods 240
11.3 Computational Bottlenecks 241
11.4 Profile-Guided Optimization for the Hybrid Platform 244
11.4.1 Full Diagonalization, Density Matrix, and DIIS 244
11.4.2 Pseudo-diagonalization 246
11.4.3 Orthogonalization Corrections in OM3 248
11.5 Performance 249
11.6 Applications 251
11.7 Conclusion 252
Acknowledgement 253
References 253

12. GPU Acceleration of Second-Order Møller–Plesset Perturbation Theory with Resolution of Identity 259
Roberto Olivares-Amaya, Adrian Jinich, Mark A. Watson and Alán Aspuru-Guzik
12.1 Møller–Plesset Perturbation Theory with Resolution of Identity Approximation (RI-MP2) 259
12.1.1 Cleaving General Matrix Multiplies (GEMMs) 262
12.1.2 Other MP2 Approaches 262
12.2 A Mixed-Precision Matrix Multiplication Library 263
12.3 Performance of Accelerated RI-MP2 266
12.3.1 Matrix Benchmarks 266
12.3.2 RI-MP2 Benchmarks 269
12.4 Example Applications 270
12.4.1 Large-Molecule Applications 270
12.4.2 Studying Thermodynamic Reactivity 271
12.5 Conclusions 273
References 273

13. Iterative Coupled-Cluster Methods on Graphics Processing Units 279
A. Eugene DePrince III, Jeff R. Hammond and C. David Sherrill
13.1 Introduction 279
13.2 Related Work 280
13.3 Theory 281
13.3.1 CCD and CCSD 281
13.3.2 Density-Fitted CCSD with a t1-Transformed Hamiltonian 282
13.4 Algorithm Details 284
13.4.1 Communication-Avoiding CCD Algorithm 284
13.4.2 Low-Storage CCSD Algorithm 285
13.4.3 Density-Fitted CCSD with a t1-Transformed Hamiltonian 286
13.5 Computational Details 287
13.5.1 Conventional CCD and CCSD 287
13.5.2 Density-Fitted CCSD 290
13.6 Results 290
13.6.1 Communication-Avoiding CCD 290
13.6.2 Low-Storage CCD and CCSD 292
13.6.3 Density-Fitted CCSD 293
13.7 Conclusions 295
Acknowledgments 296
References 296

14. Perturbative Coupled-Cluster Methods on Graphics Processing Units: Single- and Multi-Reference Formulations 301
Wenjing Ma, Kiran Bhaskaran-Nair, Oreste Villa, Edoardo Aprà, Antonino Tumeo, Sriram Krishnamoorthy and Karol Kowalski
14.1 Introduction 302
14.2 Overview of Electronic Structure Methods 303
14.2.1 Single-Reference Coupled-Cluster Formalisms 303
14.2.2 Multi-Reference Coupled-Cluster Formulations 306
14.3 NWChem Software Architecture 308
14.4 GPU Implementation 309
14.4.1 Kepler Architecture 310
14.4.2 Baseline Implementation 312
14.4.3 Kernel Optimizations 312
14.4.4 Data-Transfer Optimizations 315
14.4.5 CPU–GPU Hybrid Architecture 315
14.5 Performance 315
14.5.1 CCSD(T) Approach 316
14.5.2 MRCCSD(T) Approaches 317
14.6 Outlook 319
Acknowledgments 320
References 320
Index 327
回复此楼

» 猜你喜欢

已阅   回复此楼   关注TA 给TA发消息 送TA红花 TA的回帖

anionxt

铁杆木虫 (著名写手)


小木虫: 金币+0.5, 给个红包,谢谢回帖
没有下载链接呢?
2楼2016-03-16 12:42:52
已阅   回复此楼   关注TA 给TA发消息 送TA红花 TA的回帖

wuli8

荣誉版主 (知名作家)

…………

优秀版主优秀版主

…………
3楼2016-03-16 15:23:26
已阅   回复此楼   关注TA 给TA发消息 送TA红花 TA的回帖

qinform

木虫 (著名写手)


小木虫: 金币+0.5, 给个红包,谢谢回帖
感谢分享,要是能下载就好了
4楼2016-03-16 15:59:02
已阅   回复此楼   关注TA 给TA发消息 送TA红花 TA的回帖
相关版块跳转 我要订阅楼主 hakuna 的主题更新
普通表情 高级回复 (可上传附件)
信息提示
请填处理意见