default search action
33rd ICS 2019: Phoenix, AZ, USA
- Rudolf Eigenmann, Chen Ding, Sally A. McKee:
Proceedings of the ACM International Conference on Supercomputing, ICS 2019, Phoenix, AZ, USA, June 26-28, 2019. ACM 2019, ISBN 978-1-4503-6079-1
HPC applications
- Milinda Fernando, David Neilsen, Eric W. Hirschmann, Hari Sundar:
A scalable framework for adaptive computational general relativity on heterogeneous clusters. 1-12 - Kunpeng Wang, Shizhen Xu, Haohuan Fu, Hongkun Yu, Wenlai Zhao, Guangwen Yang:
Parallelizing cryo-EM 3D reconstruction on GPU cluster with a partitioned and streamed model. 13-23 - Jianqiao Liu, Michael P. Robson, Thomas Quinn, Milind Kulkarni:
Efficient GPU tree walks for effective distributed n-body simulations. 24-34 - Michael Gowanlock:
Hybrid CPU/GPU clustering in shared memory on the billion point scale. 35-45
Accelerator programming
- Abdul Dakkak, Cheng Li, Jinjun Xiong, Isaac Gelado, Wen-Mei W. Hwu:
Accelerating reduction and scan using tensor core units. 46-57 - Wei Zhang, Weihao Cui, Kaihua Fu, Quan Chen, Daniel Edward Mawhirter, Bo Wu, Chao Li, Minyi Guo:
Laius: Towards latency awareness and improved utilization of spatial multitasking accelerators in datacenters. 58-68 - Simon Zhang, Mengbai Xiao, Chengxin Guo, Liang Geng, Hao Wang, Xiaodong Zhang:
HYPHA: a framework based on separation of parallelisms to accelerate persistent homology matrix reduction. 69-81 - Fan Ni, Song Jiang, Hong Jiang, Jian Huang, Xingbo Wu:
SDC: a software defined cache for efficient data indexing. 82-93
HPC algorithms: linear algebra and solvers
- Zhen Xie, Guangming Tan, Weifeng Liu, Ninghui Sun:
IA-SpGEMM: an input-aware auto-tuning framework for parallel sparse matrix-matrix multiplication. 94-105 - Jieyang Chen, Nan Xiong, Xin Liang, Dingwen Tao, Sihuan Li, Kaiming Ouyang, Kai Zhao, Nathan DeBardeleben, Qiang Guan, Zizhong Chen:
TSM2: optimizing tall-and-skinny matrix-matrix multiplication on GPUs. 106-116 - Jakub Kurzak, Mark Gates, Ali Charara, Asim YarKhan, Jack J. Dongarra:
Least squares solvers for distributed-memory machines with GPU accelerators. 117-126 - Piyush Sao, Ramakrishnan Kannan, Xiaoye Sherry Li, Richard W. Vuduc:
A communication-avoiding 3D sparse triangular solver. 127-137 - Paul R. Eller, Torsten Hoefler, William Gropp:
Using performance models to understand scalable Krylov solver performance at scale for structured grid problems. 138-149 - Kurt A. O'Hearn, Abdullah Alperen, Hasan Metin Aktulga:
Performance optimization of reactive molecular dynamics simulations with dynamic charge distribution models on distributed memory platforms. 150-159
HPC computer architectures / accelerators
- Pradeep V. Kotipalli, Ranvijay Singh, Paul Wood, Ignacio Laguna, Saurabh Bagchi:
AMPT-GA: automatic mixed precision floating point tuning for GPU applications. 160-170 - Kyushick Lee, Michael B. Sullivan, Siva Kumar Sastry Hari, Timothy Tsai, Stephen W. Keckler, Mattan Erez:
GPU snapshot: checkpoint offloading for GPU-dense systems. 171-183 - Haonan Wang, Mohamed Assem Ibrahim, Sparsh Mittal, Adwait Jog:
Address-stride assisted approximate load value prediction in GPUs. 184-194 - Hussein Elnawawy, Rangeen Basu Roy Chowdhury, Amro Awad, Gregory T. Byrd:
Diligent TLBs: a mechanism for exploiting heterogeneity in TLB miss behavior. 195-205 - Xin Jin, Yaoyang Zhou, Bowen Huang, Zihao Yu, Xusheng Zhan, Huizhe Wang, Sa Wang, Ningmei Yu, Ninghui Sun, Yungang Bao:
QoSMT: supporting precise performance control for simultaneous multithreading architecture. 206-216 - Yuechen Chen, Ahmed Louri:
An online quality management framework for approximate communication in network-on-chips. 217-226
HPC algorithms: graphs and tensors
- Jiajia Li, Bora Uçar, Ümit V. Çatalyürek, Jimeng Sun, Kevin J. Barker, Richard W. Vuduc:
Efficient and effective sparse tensor reordering. 227-237 - Venkatesan T. Chakaravarthy, Shivmaran S. Pandian, Saurabh Raje, Yogish Sabharwal:
On optimizing distributed non-negative Tucker decomposition. 238-249 - Roozbeh Karimi, David M. Koppelman, Chris J. Michael:
GPU road network graph contraction and SSSP query. 250-260 - Hengjie Wang, Aparna Chandramowlishwaran:
Multi-criteria partitioning of multi-block structured grids. 261-271
Modeling / resource management
- Quan Chen, Zhenning Wang, Jingwen Leng, Chao Li, Wenli Zheng, Minyi Guo:
Avalon: towards QoS awareness and improved utilization through multi-resource management in datacenters. 272-283 - Hao Xu, Qingsen Wang, Shuang Song, Lizy Kurian John, Xu Liu:
Can we trust profiling results?: understanding and fixing the inaccuracy in modern profilers. 284-295 - Dimitrios Chasapis, Miquel Moretó, Martin Schulz, Barry Rountree, Mateo Valero, Marc Casas:
Power efficient job scheduling by predicting the impact of processor manufacturing variability. 296-307 - Hadi Zamani, Yuanlai Liu, Devashree Tripathy, Laxmi N. Bhuyan, Zizhong Chen:
GreenMM: energy efficient GPU matrix multiplication through undervolting. 308-318
Parallel programming
- Huihui Sun, Florian Fey, Jie Zhao, Sergei Gorlatch:
WCCV: improving the vectorization of IF-statements with warp-coherent conditions. 319-329 - Mohammad Norouzi Arab, Felix Wolf, Ali Jannesari:
Automatic construct selection and variable classification in OpenMP. 330-341 - Mihail Popov, Alexandra Jimborean, David Black-Schaffer:
Efficient thread/page/parallelism autotuning for NUMA systems. 342-353 - Philip Pfaffe, Tobias Grosser, Martin Peter Tillmann:
Efficient hierarchical online-autotuning: a case study on polyhedral accelerator mapping. 354-366
Distributed systems
- Abdelhalim Amer, Charles Archer, Michael Blocksome, Chongxiao Cao, Michael Chuvelev, Hajime Fujita, Maria Garzaran, Yanfei Guo, Jeff R. Hammond, Shintaro Iwasaki, Kenneth J. Raffenetti, Mikhail Shiryaev, Min Si, Kenjiro Taura, Sagar Thapaliya, Pavan Balaji:
Software combining to mitigate multithreaded MPI contention. 367-379 - Emilio Castillo, Nikhil Jain, Marc Casas, Miquel Moretó, Martin Schulz, Ramón Beivide, Mateo Valero, Abhinav Bhatele:
Optimizing computation-communication overlap in asynchronous task-based programs. 380-391 - Donghe Kang, Vedang Patel, Ashwati Nair, Spyros Blanas, Yang Wang, Srinivasan Parthasarathy:
Henosis: workload-driven small array consolidation and placement for HDF5 applications on heterogeneous data stores. 392-402 - Cunlu Li, Dezun Dong, Xiangke Liao, John Kim, Changhyun Kim:
DeepHiR: improving high-radix router throughput with deep hybrid memory buffer microarchitecture. 403-413
Machine learning acceleration
- Aleksandar Zlateski, Zhen Jia, Kai Li, Frédo Durand:
The anatomy of efficient FFT and winograd convolutions on modern CPUs. 414-424 - Karan Aggarwal, Uday Bondhugula:
Optimizing the linear fascicle evaluation algorithm for many-core systems. 425-437 - Lin Ning, Xipeng Shen:
Deep reuse: streamline CNN inference on the fly via coarse-grained computation reuse. 438-448 - Bradley McDanel, Sai Qian Zhang, H. T. Kung, Xin Dong:
Full-stack optimization for accelerating CNNs using powers-of-two weights with FPGA validation. 449-460 - Tong Geng, Tianqi Wang, Chunshu Wu, Chen Yang, Wei Wu, Ang Li, Martin C. Herbordt:
O3BNN: an out-of-order architecture for high-performance binarized neural network inference with fine-grained pruning. 461-472 - Lei Zhao, Quan Deng, Youtao Zhang, Jun Yang:
RFAcc: a 3D ReRAM associative array based random forest accelerator. 473-483
Correctness, efficiency and security
- Bo Fang, Hassan Halawa, Karthik Pattabiraman, Matei Ripeanu, Sriram Krishnamoorthy:
BonVoision: leveraging spatial data smoothness for recovery from memory soft errors. 484-496 - Qiumin Xu, Hoda Naghibijouybari, Shibo Wang, Nael B. Abu-Ghazaleh, Murali Annavaram:
GPUGuard: mitigating contention based side and covert channel attacks on GPUs. 497-509 - Yongbin Gu, Lizhong Chen:
Dynamically linked MSHRs for adaptive miss handling in GPUs. 510-521
manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.