ROLE
Author only
KEYWORDS
See all author supplied keywords
BOOKMARK & SHARE
|
|
21 results found
Export Results:
bibtex
| endnote
| acmref
| csv
1
A PCIe congestion-aware performance model for densely populated accelerator servers
November 2016
SC '16: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis
Publisher: IEEE Press
Bibliometrics:
Citation Count: 0
Downloads (6 Weeks): 11, Downloads (12 Months): 72, Downloads (Overall): 72
Full text available:
PDF
MeteoSwiss, the Swiss national weather forecast institute, has selected densely populated accelerator servers as their primary system to compute weather forecast simulation. Servers with multiple accelerator devices that are primarily connected by a PCI-Express (PCIe) network achieve a significantly higher energy efficiency. Memory transfers between accelerators in such a system ...
Keywords:
PCI-express, multiple GPUs, performance model
2
Efficient implementation of quantum materials simulations on distributed CPU-GPU systems
November 2015
SC '15: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis
Publisher: ACM
Bibliometrics:
Citation Count: 1
Downloads (6 Weeks): 5, Downloads (12 Months): 72, Downloads (Overall): 191
Full text available:
PDF
We present a scalable implementation of the Linearized Augmented Plane Wave method for distributed memory systems, which relies on an efficient distributed, block-cyclic setup of the Hamiltonian and overlap matrices and allows us to turn around highly accurate 1000+ atom all-electron quantum materials simulations on clusters with a few hundred ...
3
November 2015
SC '15: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis
Publisher: ACM
Bibliometrics:
Citation Count: 5
Downloads (6 Weeks): 10, Downloads (12 Months): 83, Downloads (Overall): 182
Full text available:
PDF
Many high-performance computing applications solving partial differential equations (PDEs) can be attributed to the class of kernels using stencils on structured grids. Due to the disparity between floating point operation throughput and main memory bandwidth these codes typically achieve only a low fraction of peak performance. Unfortunately, stencil computation optimization ...
Keywords:
stencil, atmospheric model, domain-specific language, heterogeneous system
4
Application centric energy-efficiency study of distributed multi-core and hybrid CPU-GPU systems
November 2014
SC '14: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis
Publisher: IEEE Press
Bibliometrics:
Citation Count: 0
Downloads (6 Weeks): 1, Downloads (12 Months): 25, Downloads (Overall): 221
Full text available:
PDF
We study the energy used by a production-level regional climate and weather simulation code on a distributed memory system with hybrid CPU-GPU nodes. The code is optimised for both processor architectures, for which we investigate both time and energy to solution. Operational constraints for time to solution can be met ...
5
Tridiagonalization of a dense symmetric matrix on multiple GPUs and its application to symmetric eigenvalue problems
November 2014
Concurrency and Computation: Practice & Experience: Volume 26 Issue 16, November 2014
Publisher: John Wiley and Sons Ltd.
For software to fully exploit the computing power of emerging heterogeneous computers, not only must the required computational kernels be optimized for the specific hardware architectures but also an effective scheduling scheme is needed to utilize the available heterogeneous computational units and to hide the communication between them. As a ...
Keywords:
GPU accelerators, symmetric matrix-vector multiplication, parallel eigensolver, dense linear algebra, symmetric tridiagonal reduction
6
A novel hybrid CPU-GPU generalized eigensolver for electronic structure calculations based on fine-grained memory aware tasks
May 2014
International Journal of High Performance Computing Applications: Volume 28 Issue 2, May 2014
Publisher: Sage Publications, Inc.
The adoption of hybrid CPU-GPU nodes in traditional supercomputing platforms such as the Cray-XK6 opens acceleration opportunities for electronic structure calculations in materials science and chemistry applications, where medium-sized generalized eigenvalue problems must be solved many times. These eigenvalue problems are too small to effectively solve on distributed systems, but ...
Keywords:
hybrid, electronic structure calculations, generalized eigensolver, high performance, multicore, Eigensolver, GPU, two-stage
7
Towards a performance portable, architecture agnostic implementation strategy for weather and climate models
April 2014
Supercomputing Frontiers and Innovations: an International Journal: Volume 1 Issue 1, April 2014
Publisher: South Ural State University
We propose a software implementation strategy for complex weather and climate models that produces performance portable, architecture agnostic codes. It relies on domain and data structure specific tools that are usable within common model development frameworks - Fortran today and possibly high-level programming environments like Python in the future. We ...
Keywords:
climate modeling, hybrid computing, programming models, numerical weather prediction
8
Taking a quantum leap in time to solution for simulations of high-Tc superconductors
November 2013
SC '13: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Publisher: ACM
Bibliometrics:
Citation Count: 1
Downloads (6 Weeks): 5, Downloads (12 Months): 50, Downloads (Overall): 401
Full text available:
PDF
We present a new quantum cluster algorithm to simulate models of high-Tc superconductors. This algorithm extends current methods with continuous lattice self-energies, thereby removing artificial long-range correlations. This cures the fermionic sign problem in the underlying quantum Monte Carlo solver for large clusters and realistic values of the Coulomb interaction ...
9
Topic 14+16: high-performance and scientific applications and extreme-scale computing
August 2013
Euro-Par'13: Proceedings of the 19th international conference on Parallel Processing
Publisher: Springer-Verlag
As our understanding of the world around us increases it becomes more challenging to make use of what we already know, and to increase our understanding still further. Computational modeling and simulation have become critical tools in addressing this challenge. The requirements of high-resolution, accurate modeling have outstripped the ability ...
10
Early experiences with scientific applications on the IBM Blue Gene/Q supercomputer
S. Alam,
C. Bekas,
H. Boettiger,
A. Curioni,
G. Fourestey,
W. Homberg,
M. Knobloch,
T. Laino,
T. Maurer,
B. Mohr,
D. Pleiter,
A. Schiller,
T. Schulthess,
V. Weber
January 2013
IBM Journal of Research and Development: Volume 57 Issue 1, January 2013
Publisher: IBM Corp.
We report early experiences with porting highly complex scientific applications to the IBM Blue Gene®/Q platform. In addition, we report our progress in porting performance analysis tools that are deemed to be key in helping users understand massively parallel, massively threaded applications. Porting proved to be quite a smooth process. ...
11
Poster: A Novel Hybrid CPU-GPU Generalized Eigensolver for Electronic Structure Calculations Based on Fine Grained Memory Aware Tasks
November 2012
SCC '12: Proceedings of the 2012 SC Companion: High Performance Computing, Networking Storage and Analysis
Publisher: IEEE Computer Society
The adoption of hybrid GPU-CPU nodes in traditional supercomputing platforms such as the Cray-XK6 opens acceleration opportunities for electronic structure calculations in materials science and chemistry applications, where medium-sized generalized eigenvalue problems must be solved many times. These eigenvalue problems are too small to scale on distributed systems, but can ...
Keywords:
generalized eigenvalue problem, eigenvalue and eigenvectors computation, 2-stage algorithm, hybrid computing, GPU
12
Abstract: A Novel Hybrid CPU-GPU Generalized Eigensolver for Electronic Structure Calculations Based on Fine Grained Memory Aware Tasks
November 2012
SCC '12: Proceedings of the 2012 SC Companion: High Performance Computing, Networking Storage and Analysis
Publisher: IEEE Computer Society
The adoption of hybrid GPU-CPU nodes in traditional supercomputing platforms such as the Cray-XK6 opens acceleration opportunities for electronic structure calculations in materials science and chemistry applications, where medium-sized generalized eigenvalue problems must be solved many times. These eigenvalue problems are too small to effectively solve on distributed systems, but ...
Keywords:
generalized eigenvalue problem, eigenvalue and eigenvectors computation, 2-stage algorithm, hybrid computing, GPU
13
Towards autotuning by alternating communication methods
October 2012
ACM SIGMETRICS Performance Evaluation Review: Volume 40 Issue 2, September 2012
Publisher: ACM
Bibliometrics:
Citation Count: 1
Downloads (6 Weeks): 1, Downloads (12 Months): 8, Downloads (Overall): 63
Full text available:
PDF
Interconnects in emerging high performance computing systems feature hardware support for one-sided, asynchronous communication and global address space programming models in order to improve parallel efficiency and productivity by allowing communication and computation overlap and outof- order delivery. In practice though, complex interactions between the software stack and the communication ...
Keywords:
autotuning, PGAS, one-sided communication
14
Towards autotuning by alternating communication methods
November 2011
PMBS '11: Proceedings of the second international workshop on Performance modeling, benchmarking and simulation of high performance computing systems
Publisher: ACM
Bibliometrics:
Citation Count: 0
Downloads (6 Weeks): 2, Downloads (12 Months): 4, Downloads (Overall): 33
Full text available:
PDF
Interconnects in emerging high performance computing systems feature hardware support for one-sided, asynchronous communication and global address space programming models in order to improve parallel efficiency and productivity by allowing communication and computation overlap and out-of-order delivery. In practice though, complex interactions between the software stack and the communication hardware ...
Keywords:
PGAs, autotuning, code generation, one-sided communication, MMPS, RDMA
15
Toward First Principles Electronic Structure Simulations of Excited States and Strong Correlations in Nano- and Materials Science
November 2010
SC '10: Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Publisher: IEEE Computer Society
Bibliometrics:
Citation Count: 2
Downloads (6 Weeks): 1, Downloads (12 Months): 8, Downloads (Overall): 273
Full text available:
PDF
Methods based on the many-body Green's function are generally accepted as the path forward beyond Kohn-Sham based density functional theory, in order to compute from first principles electronic structure of materials with strong correlations and excited-state properties in nano- and materials science. Here we present an efficient method to compute ...
16
November 2009
SC '09: Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Publisher: ACM
Bibliometrics:
Citation Count: 5
Downloads (6 Weeks): 5, Downloads (12 Months): 21, Downloads (Overall): 314
Full text available:
PDF
Calculating the thermodynamics of nanoscale systems presents challenges in the simultaneous treatment of the electronic structure, which determines the interactions between atoms, and the statistical fluctuations that become ever more important at shorter length scales. Here we present a highly scalable method that combines ab initio electronic structure techniques, we ...
17
Accuracy and performance of graphics processors: A Quantum Monte Carlo application case study
March 2009
Parallel Computing: Volume 35 Issue 3, March, 2009
Publisher: Elsevier Science Publishers B. V.
The tradeoffs of accuracy and performance are as yet an unsolved problem when dealing with Graphics Processing Units (GPUs) as a general-purpose computation device. Their high performance and low cost makes them a desirable target for scientific computation, and new language efforts help address the programming challenges of data parallel ...
Keywords:
GPU, Parallel computing, Quantum Monte Carlo, Accuracy, Graphics processors, Performance
18
New algorithm to enable 400+ TFlop/s sustained performance in simulations of disorder effects in high-Tc superconductors
G. Alvarez,
M. S. Summers,
D. E. Maxwell,
M. Eisenbach,
J. S. Meredith,
J. M. Larkin,
J. Levesque,
T. A. Maier,
P. R. C. Kent,
E. F. D'Azevedo,
T. C. Schulthess
November 2008
SC '08: Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Publisher: IEEE Press
Bibliometrics:
Citation Count: 3
Downloads (6 Weeks): 5, Downloads (12 Months): 12, Downloads (Overall): 441
Full text available:
PDF
Staggering computational and algorithmic advances in recent years now make possible systematic Quantum Monte Carlo (QMC) simulations of high temperature (high-Tc) superconductivity in a microscopic model, the two dimensional (2D) Hubbard model, with parameters relevant to the cuprate materials. Here we report the algorithmic and computational advances that enable us ...
19
Toward material-specific simulations of high temperature superconductivity
November 2006
SC '06: Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Publisher: ACM
Bibliometrics:
Citation Count: 0
Downloads (6 Weeks): 2, Downloads (12 Months): 6, Downloads (Overall): 73
Full text available:
Html
High temperature superconductors could potentially revolutionize the use and transmission of electric power. This along with intriguing scientific questions have motivated an enormous research effort over the past twenty years, since the discovery of high temperature superconducting cuprates. But only recently, with the advent of massively parallel vector supercomputers and ...
20
A. Canning,
B. Ujfalussy,
T. C. Schulthess,
X.-G. Zhang,
W. A. Shelton,
D. M. C. Nicholson,
G. M. Stocks,
Yang Wang,
T. Dirks
April 2003
IPDPS '03: Proceedings of the 17th International Symposium on Parallel and Distributed Processing
Publisher: IEEE Computer Society
Massively parallel computers have been used to perform first principles spin dynamics (SD) simulations of the magnetic structure of -FeMn (Iron-Manganese). The code uses a novel parallel approach to solve the Kohn-Sham equations obtained from density functional theory (DFT). This approach uses limited local communications avoiding the large global communications ...
Keywords:
parallel computing, materials science, message passing
|
|