| 查看: 1045 | 回复: 3 | |||
hakuna木虫 (知名作家)
|
[交流]
最新出版的一本关于GPU上电子结构计算的书 已有3人参与
|
|
Electronic Structure Calculations on Graphics Processing Units: From Quantum Chemistry to Condensed Matter Physics Ross C. Walker (Editor), Andreas W. Goetz (Editor) ISBN: 978-1-118-66178-9 376 pages February 2016 Electronic Structure Calculations on Graphics Processing Units: From Quantum Chemistry to Condensed Matter Physics provides an overview of computing on graphics processing units (GPUs), a brief introduction to GPU programming, and the latest examples of code developments and applications for the most widely used electronic structure methods. The book covers all commonly used basis sets including localized Gaussian and Slater type basis functions, plane waves, wavelets and real-space grid-based approaches. The chapters expose details on the calculation of two-electron integrals, exchange-correlation quadrature, Fock matrix formation, solution of the self-consistent field equations, calculation of nuclear gradients to obtain forces, and methods to treat excited states within DFT. Other chapters focus on semiempirical and correlated wave function methods including density fitted second order Møller-Plesset perturbation theory and both iterative and perturbative single- and multireference coupled cluster methods. Electronic Structure Calculations on Graphics Processing Units: From Quantum Chemistry to Condensed Matter Physics presents an accessible overview of the field for graduate students and senior researchers of theoretical and computational chemistry, condensed matter physics and materials science, as well as software developers looking for an entry point into the realm of GPU and hybrid GPU/CPU programming for electronic structure calculations. Table of Contents List of Contributors xiii Preface xvii Acknowledgments xix Glossary xxi Abbreviations xxv 1. Why Graphics Processing Units 1 Perri Needham, Andreas W. Götz and Ross C. Walker 1.1 A Historical Perspective of Parallel Computing 1 1.2 The Rise of the GPU 5 1.3 Parallel Computing on Central Processing Units 7 1.3.1 Parallel Programming Memory Models 7 1.3.2 Parallel Programming Languages 8 1.3.3 Types of Parallelism 9 1.3.4 Parallel Performance Considerations 10 1.4 Parallel Computing on Graphics Processing Units 12 1.4.1 GPU Memory Model 12 1.4.2 GPU APIs 12 1.4.3 Suitable Code for GPU Acceleration 13 1.4.4 Scalability, Performance, and Cost Effectiveness 14 1.5 GPU-Accelerated Applications 15 1.5.1 Amber 15 1.5.2 Adobe Premier Pro CC 18 References 19 2. GPUs: Hardware to Software 23 Perri Needham, Andreas W. Götz and Ross C. Walker 2.1 Basic GPU Terminology 24 2.2 Architecture of GPUs 24 2.2.1 General Nvidia Hardware Features 25 2.2.2 Warp Scheduling 25 2.2.3 Evolution of Nvidia Hardware through the Generations 26 2.3 CUDA Programming Model 26 2.3.1 Kernels 27 2.3.2 Thread Hierarchy 27 2.3.3 Memory Hierarchy 29 2.4 Programming and Optimization Concepts 30 2.4.1 Latency: Memory Access 30 2.4.2 Coalescing Device Memory Accesses 31 2.4.3 Shared Memory Bank Conflicts 31 2.4.4 Latency: Issuing Instructions to Warps 32 2.4.5 Occupancy 32 2.4.6 Synchronous and Asynchronous Execution 33 2.4.7 Stream Programming and Batching 33 2.5 Software Libraries for GPUs 34 2.6 Special Features of CUDA-Enabled GPUs 35 2.6.1 Hyper-Q 35 2.6.2 MPS 35 2.6.3 Unified Memory 35 2.6.4 NVLink 36 References 36 3. Overview of Electronic Structure Methods 39 Andreas W. Götz 3.1 Introduction 39 3.1.1 Computational Complexity 40 3.1.2 Application Fields, from Structures to Spectroscopy 41 3.1.3 Chapter Overview 41 3.2 Hartree–Fock Theory 42 3.2.1 Basis Set Representation 43 3.2.2 Two-Electron Repulsion Integrals 44 3.2.3 Diagonalization 45 3.3 Density Functional Theory 46 3.3.1 Kohn–Sham Theory 46 3.3.2 Exchange-Correlation Functionals 47 3.3.3 Exchange-Correlation Quadrature 49 3.4 Basis Sets 49 3.4.1 Slater-Type Functions 49 3.4.2 Gaussian-Type Functions 50 3.4.3 Plane Waves 51 3.4.4 Representations on a Numerical Grid 52 3.4.5 Auxiliary Basis Sets 52 3.5 Semiempirical Methods 53 3.5.1 Neglect of Diatomic Differential Overlap 53 3.5.2 Fock Matrix Elements 54 3.5.3 Two-Electron Repulsion Integrals 54 3.5.4 Energy and Core Repulsion 55 3.5.5 Models Beyond MNDO 56 3.6 Density Functional Tight Binding 56 3.7 Wave Function-Based Electron Correlation Methods 57 3.7.1 Møller–Plesset Perturbation Theory 59 3.7.2 Coupled Cluster Theory 59 Acknowledgments 60 References 61 4. Gaussian Basis Set Hartree–Fock, Density Functional Theory, and Beyond on GPUs 67 Nathan Luehr, Aaron Sisto and Todd J. Martínez 4.1 Quantum Chemistry Review 68 4.1.1 Self-Consistent Field Equations in Gaussian Basis Sets 68 4.1.2 Electron–Electron Repulsion Integral Evaluation 71 4.2 Hardware and CUDA Overview 72 4.3 GPU ERI Evaluation 73 4.3.1 One-Block-One-Contracted Integral 74 4.3.2 One-Thread-One-Contracted Integral 75 4.3.3 One-Thread-One-Primitive Integral 75 4.3.4 Comparison of Contracted ERI Schemes 76 4.3.5 Extensions to Higher Angular Momentum 77 4.4 Integral-Direct Fock Construction on GPUs 78 4.4.1 GPU J-Engine 79 4.4.2 GPU K-Engine 81 4.4.3 Exchange–Correlation Integration 85 4.5 Precision Considerations 88 4.6 Post-SCF Methods 91 4.7 Example Calculations 93 4.8 Conclusions and Outlook 97 References 98 5. GPU Acceleration for Density Functional Theory with Slater-Type Orbitals 101 Hans van Schoot and Lucas Visscher 5.1 Background 101 5.2 Theory and CPU Implementation 102 5.2.1 Numerical Quadrature of the Fock Matrix 102 5.2.2 CPU Code SCF Performance 103 5.3 GPU Implementation 105 5.3.1 Hardware and Software Requirements 105 5.3.2 GPU Kernel Code 106 5.3.3 Hybrid CPU/GPU Computing Scheme 108 5.3.4 Speed-Up Results for a Single-Point Calculation 110 5.3.5 Speed-Up Results for an Analytical Frequency Calculation 110 5.4 Conclusion 112 References 113 6. Wavelet-Based Density Functional Theory on Massively Parallel Hybrid Architectures 115 Luigi Genovese, Brice Videau, Damien Caliste, Jean-François Méhaut, Stefan Goedecker and Thierry Deutsch 6.1 Introductory Remarks on Wavelet Basis Sets for Density Functional Theory Implementations 115 6.2 Operators in Wavelet Basis Sets 117 6.2.1 Daubechies Wavelets Basis and Convolutions 117 6.2.2 The Kohn–Sham Formalism 119 6.2.3 Three-Dimensional Basis 120 6.2.4 The Kinetic Operator and the Local Potential 121 6.2.5 Poisson Solver 122 6.3 Parallelization 123 6.3.1 MPI Parallel Performance and Architecture Dependence 123 6.4 GPU Architecture 124 6.4.1 GPU Implementation Using the OpenCL Language 125 6.4.2 Implementation Details of the Convolution Kernel 126 6.4.3 Performance of the GPU Convolution Routines 128 6.4.4 Three-Dimensional Operators, Complete BigDFT Code 128 6.4.5 Other GPU Accelerations 132 6.5 Conclusions and Outlook 132 6.5.1 Evaluation of Performance Benefits for Complex Codes 132 References 133 7. Plane-Wave Density Functional Theory 135 Maxwell Hutchinson, Paul Fleurat-Lessard, Ani Anciaux-Sedrakian, Dusan Stosic, Jeroen Bédorf and Sarah Tariq 7.1 Introduction 135 7.2 Theoretical Background 136 7.2.1 Self-Consistent Field 136 7.2.2 Ultrasoft Pseudopotentials 138 7.2.3 Projector Augmented Wave (PAW) Method 138 7.2.4 Force and Stress 139 7.2.5 Iterative Diagonalization 140 7.3 Implementation 143 7.3.1 Transformations 143 7.3.2 Functionals 145 7.3.3 Diagonalization 145 7.3.4 Occupancies 147 7.3.5 Electron Density 147 7.3.6 Forces 147 7.4 Optimizations 148 7.4.1 GPU Optimization Techniques 148 7.4.2 Parallel Optimization Techniques (Off-Node) 150 7.4.3 Numerical Optimization Techniques 151 7.5 Performance Examples 151 7.5.1 Benchmark Settings 151 7.5.2 Self-Consistent Charge Density 154 7.5.3 Band Structure 156 7.5.4 AIMD 157 7.5.5 Structural Relaxation 158 7.6 Exact Exchange with Plane Waves 159 7.6.1 Implementation 160 7.6.2 Optimization 162 7.6.3 Performance/Examples 163 7.7 Summary and Outlook 165 Acknowledgments 165 References 165 Appendix A: Definitions and Conventions 168 Appendix B: Example Kernels 168 8. GPU-Accelerated Sparse Matrix–Matrix Multiplication for Linear Scaling Density Functional Theory 173 Ole Schütt, Peter Messmer, Jürg Hutter and Joost VandeVondele 8.1 Introduction 173 8.1.1 Linear Scaling Self-Consistent Field 173 8.1.2 DBCSR: A Sparse Matrix Library 177 8.2 Software Architecture for GPU-Acceleration 177 8.2.1 Cannon Layer 178 8.2.2 Multrec Layer 179 8.2.3 CSR Layer 179 8.2.4 Scheduler and Driver Layers 179 8.3 Maximizing Asynchronous Progress 180 8.3.1 CUDA Streams and Events 180 8.3.2 Double Buffered Cannon on Host and Device 181 8.4 Libcusmm: GPU Accelerated Small Matrix Multiplications 183 8.4.1 Small Matrix Multiplication Performance Model 183 8.4.2 Matrix-Product Algorithm Choice 183 8.4.3 GPU Implementation: Generic Algorithm 184 8.4.4 Auto-Tuning and Performance 186 8.5 Benchmarks and Conclusions 186 Acknowledgments 189 References 189 9. Grid-Based Projector-Augmented Wave Method 191 Samuli Hakala, Jussi Enkovaara, Ville Havu, Jun Yan, Lin Li, Chris O’Grady and Risto M. Nieminen 9.1 Introduction 191 9.2 General Overview 193 9.2.1 Projector-Augmented Wave Method 193 9.2.2 Uniform Real-Space Grids 195 9.2.3 Multigrid Method 195 9.3 Using GPUs in Ground-State Calculations 196 9.3.1 Stencil Operations 198 9.3.2 Hybrid Level 3 BLAS Functions 198 9.3.3 Parallelization for Multiple GPUs 199 9.3.4 Results 200 9.4 Time-Dependent Density Functional Theory 202 9.4.1 GPU Implementation 202 9.4.2 Results 203 9.5 Random Phase Approximation for the Correlation Energy 203 9.5.1 GPU Implementation 204 9.5.2 Performance Analysis Techniques 205 9.5.3 Results 206 9.6 Summary and Outlook 207 Acknowledgments 208 References 208 10. Application of Graphics Processing Units to Accelerate Real-Space Density Functional Theory and Time-Dependent Density Functional Theory Calculations 211 Xavier Andrade and Alán Aspuru-Guzik 10.1 Introduction 212 10.2 The Real-Space Representation 213 10.3 Numerical Aspects of the Real-Space Approach 214 10.4 General GPU Optimization Strategy 216 10.5 Kohn–Sham Hamiltonian 217 10.6 Orthogonalization and Subspace Diagonalization 221 10.7 Exponentiation 222 10.8 The Hartree Potential 223 10.9 Other Operations 224 10.10 Numerical Performance 225 10.11 Conclusions 228 10.12 Computational Methods 228 Acknowledgments 229 References 229 11. Semiempirical Quantum Chemistry 239 Xin Wu, Axel Koslowski and Walter Thiel 11.1 Introduction 239 11.2 Overview of Semiempirical Methods 240 11.3 Computational Bottlenecks 241 11.4 Profile-Guided Optimization for the Hybrid Platform 244 11.4.1 Full Diagonalization, Density Matrix, and DIIS 244 11.4.2 Pseudo-diagonalization 246 11.4.3 Orthogonalization Corrections in OM3 248 11.5 Performance 249 11.6 Applications 251 11.7 Conclusion 252 Acknowledgement 253 References 253 12. GPU Acceleration of Second-Order Møller–Plesset Perturbation Theory with Resolution of Identity 259 Roberto Olivares-Amaya, Adrian Jinich, Mark A. Watson and Alán Aspuru-Guzik 12.1 Møller–Plesset Perturbation Theory with Resolution of Identity Approximation (RI-MP2) 259 12.1.1 Cleaving General Matrix Multiplies (GEMMs) 262 12.1.2 Other MP2 Approaches 262 12.2 A Mixed-Precision Matrix Multiplication Library 263 12.3 Performance of Accelerated RI-MP2 266 12.3.1 Matrix Benchmarks 266 12.3.2 RI-MP2 Benchmarks 269 12.4 Example Applications 270 12.4.1 Large-Molecule Applications 270 12.4.2 Studying Thermodynamic Reactivity 271 12.5 Conclusions 273 References 273 13. Iterative Coupled-Cluster Methods on Graphics Processing Units 279 A. Eugene DePrince III, Jeff R. Hammond and C. David Sherrill 13.1 Introduction 279 13.2 Related Work 280 13.3 Theory 281 13.3.1 CCD and CCSD 281 13.3.2 Density-Fitted CCSD with a t1-Transformed Hamiltonian 282 13.4 Algorithm Details 284 13.4.1 Communication-Avoiding CCD Algorithm 284 13.4.2 Low-Storage CCSD Algorithm 285 13.4.3 Density-Fitted CCSD with a t1-Transformed Hamiltonian 286 13.5 Computational Details 287 13.5.1 Conventional CCD and CCSD 287 13.5.2 Density-Fitted CCSD 290 13.6 Results 290 13.6.1 Communication-Avoiding CCD 290 13.6.2 Low-Storage CCD and CCSD 292 13.6.3 Density-Fitted CCSD 293 13.7 Conclusions 295 Acknowledgments 296 References 296 14. Perturbative Coupled-Cluster Methods on Graphics Processing Units: Single- and Multi-Reference Formulations 301 Wenjing Ma, Kiran Bhaskaran-Nair, Oreste Villa, Edoardo Aprà, Antonino Tumeo, Sriram Krishnamoorthy and Karol Kowalski 14.1 Introduction 302 14.2 Overview of Electronic Structure Methods 303 14.2.1 Single-Reference Coupled-Cluster Formalisms 303 14.2.2 Multi-Reference Coupled-Cluster Formulations 306 14.3 NWChem Software Architecture 308 14.4 GPU Implementation 309 14.4.1 Kepler Architecture 310 14.4.2 Baseline Implementation 312 14.4.3 Kernel Optimizations 312 14.4.4 Data-Transfer Optimizations 315 14.4.5 CPU–GPU Hybrid Architecture 315 14.5 Performance 315 14.5.1 CCSD(T) Approach 316 14.5.2 MRCCSD(T) Approaches 317 14.6 Outlook 319 Acknowledgments 320 References 320 Index 327 |
» 猜你喜欢
参与限项
已经有3人回复
假如你的研究生提出不合理要求
已经有7人回复
实验室接单子
已经有4人回复
全日制(定向)博士
已经有4人回复
对氯苯硼酸纯化
已经有3人回复
求助:我三月中下旬出站,青基依托单位怎么办?
已经有12人回复
不自信的我
已经有12人回复
所感
已经有4人回复
要不要辞职读博?
已经有7人回复
北核录用
已经有3人回复
anionxt
铁杆木虫 (著名写手)
- 应助: 28 (小学生)
- 金币: 10744.9
- 红花: 1
- 帖子: 1100
- 在线: 464.9小时
- 虫号: 992419
- 注册: 2010-04-09
- 专业: 凝聚态物性I:结构、力学和
2楼2016-03-16 12:42:52
wuli8
荣誉版主 (知名作家)
…………
- 1ST强帖: 2
- 应助: 35 (小学生)
- 贵宾: 12.924
- 金币: 20188.4
- 散金: 15888
- 红花: 88
- 沙发: 4
- 帖子: 7840
- 在线: 1114.6小时
- 虫号: 465889
- 注册: 2007-11-23
- 专业: 物理学I
- 管辖: 计算模拟

3楼2016-03-16 15:23:26
qinform
木虫 (著名写手)
- 应助: 7 (幼儿园)
- 金币: 17458.5
- 红花: 4
- 帖子: 1482
- 在线: 1157.8小时
- 虫号: 97142
- 注册: 2005-11-08
- 性别: GG
- 专业: 凝聚态物性 II :电子结构
4楼2016-03-16 15:59:02












回复此楼