


default search action
38th IPDPS 2024: San Francisco, CA, USA
- IEEE International Parallel and Distributed Processing Symposium, IPDPS 2024, San Francisco, CA, USA, May 27-31, 2024. IEEE 2024, ISBN 979-8-3503-8711-7

- Zhengding Hu, Jingwei Sun, Zhongyang Li, Guangzhong Sun:

PckGNN: Optimizing Aggregation Operators with Packing Strategies in Graph Neural Networks. 2-13 - Luhan Wang

, Haipeng Jia, Lei Xu, Cunyang Wei
, Kun Li, Xianmeng Jiang, Yunquan Zhang:
VNEC: A Vectorized Non-Empty Column Format for SpMV on CPUs. 14-25 - Ichitaro Yamazaki, Andrew J. Higgins, Erik G. Boman, Daniel B. Szyld:

Two-Stage Block Orthogonalization to Improve Performance of s-step GMRES. 26-37 - Oded Schwartz, Sivan Toledo, Noa Vaknin, Gal Wiernik

:
Alternative Basis Matrix Multiplication is Fast and Stable. 38-51 - Tianyu Liang, Riley Murray

, Aydin Buluç, James Demmel:
Fast multiplication of random dense matrices with sparse matrices. 52-62 - Takeshi Fukaya, Yuji Nakatsukasa, Yusaku Yamamoto:

A Cholesky QR type algorithm for computing tall-skinny QR factorization with column pivoting. 63-75 - Yunfei Gu, Yihui Lu, Chentao Wu, Jie Li, Minyi Guo:

CKSM: An Efficient Memory Deduplication Method for Container-based Cloud Computing Systems. 76-88 - Amelie Chi Zhou, Rongzheng Huang, Zhoubin Ke, Yusen Li, Yi Wang, Rui Mao:

Tackling Cold Start in Serverless Computing with Multi-Level Container Reuse. 89-99 - Vivek M. Bhasi, Aakash Sharma, Shruti Mohanty, Mahmut Taylan Kandemir, Chita R. Das:

Paldia: Enabling SLO-Compliant and Cost-Effective Serverless Computing on Heterogeneous Hardware. 100-113 - Moiz Arif, Avinash Maurya, M. Mustafa Rafique, Dimitrios S. Nikolopoulos, Ali Raza Butt:

Application-Attuned Memory Management for Containerized HPC Workflows. 114-127 - Yunlong Cheng, Xiuqi Huang

, Zifeng Liu, Jiadong Chen, Xiaofeng Gao, Zhen Fang, Yongqiang Yang:
FEDGE: An Interference-Aware QoS Prediction Framework for Black-Box Scenario in IaaS Clouds with Domain Generalization. 128-138 - Marcin Copik, Marcin Chrapek, Larissa Schmid, Alexandru Calotoiu, Torsten Hoefler:

Software Resource Disaggregation for HPC with Serverless Computing. 139-156 - Haishuang Fan

, Rui Meng, Qichu Sun
, Jingya Wu, Wenyan Lu, Xiaowei Li, Guihai Yan:
AMST: Accelerating Large-Scale Graph Minimum Spanning Tree Computation on FPGA. 157-168 - Ilya Kokorin, Victor Yudov, Vitaly Aksenov

, Dan Alistarh:
Wait-free Trees with Asymptotically-Efficient Range Queries. 169-179 - Yves Baumann, Tal Ben-Nun, Maciej Besta, Lukas Gianinazzi, Torsten Hoefler, Piotr Luczynski:

Low-Depth Spatial Tree Algorithms. 180-192 - Juntao Zhao, Borui Wan, Yanghua Peng, Haibin Lin, Yibo Zhu, Chuan Wu:

QSync: Quantization-Minimized Synchronous Distributed Training Across Hybrid Devices. 193-204 - Van An Le, Nam Duong Tran, Phuong Nam Nguyen

, Thanh Hung Nguyen, Phi Le Nguyen, Truong Thao Nguyen, Yusheng Ji:
Enhancing the Generalization of Personalized Federated Learning with Multi-head Model and Ensemble Voting. 205-216 - Yifei Li, Ryan Chard, Yadu N. Babuji, Kyle Chard, Ian T. Foster, Zhuozhao Li:

UniFaaS: Programming across Distributed Cyberinfrastructure with Federated Function Serving. 217-229 - Zhiqian Xu, Honghui Shang, Yi Fan, Xiongzhi Zeng, Yunquan Zhang, Chu Guo:

Scalable and Differentiable Simulator for Quantum Computational Chemistry. 230-240 - S. M. Ferdous

, Reece Neff
, Bo Peng, Salman Shuvo, Marco Minutoli, Sayak Mukherjee, Karol Kowalski, Michela Becchi, Mahantesh Halappanavar:
Picasso: Memory-Efficient Graph Coloring Using Palettes With Applications in Quantum Computing. 241-252 - Niteya Shah, Christine Sweeney

, Vinay Ramakrishnaiah, Jeffrey Donatelli, Wu-Chun Feng:
Optimizing and Scaling the 3D Reconstruction of Single-Particle Imaging. 253-264 - Xiran Zhang

, Sameh Abdulah, Jian Cao, Hatem Ltaief
, Ying Sun, Marc G. Genton, David E. Keyes:
Parallel Approximations for High-Dimensional Multivariate Normal Probability Computation in Confidence Region Detection Applications. 265-276 - Zeyu Song, Lin Gan, Shengye Xiang, Yinuo Wang, Xiaohui Duan, Guangwen Yang:

Enabling High-Performance Physical Based Rendering on New Sunway Supercomputer. 277-288 - Taolei Wang, Chao Li, Jing Wang, Cheng Xu, Xiaofeng Hou, Minyi Guo:

CoCG: Fine-grained Cloud Game Co-location on Heterogeneous Platform. 289-299 - Thanh Son Phung, Douglas Thain

:
Adaptive Task-Oriented Resource Allocation for Large Dynamic Workflows on Opportunistic Resources. 300-311 - David Álvarez, Kevin Sala, Vicenç Beltran:

nOS-V: Co-Executing HPC Applications Using System-Wide Task Scheduling. 312-324 - Jing Chen, Madhavan Manivannan, Bhavishya Goel, Miquel Pericàs:

SWEEP: Adaptive Task Scheduling for Exploring Energy Performance Trade-offs. 325-336 - Baolin Li, Siddharth Samsi, Vijay Gadepally, Devesh Tiwari:

Interpretable Analysis of Production GPU Clusters Monitoring Data via Association Rule Mining. 337-349 - Jan Laukemann

, Thomas Gruber, Georg Hager, Dossay Oryspayev, Gerhard Wellein:
CloverLeaf on Intel Multi-Core CPUs: A Case Study in Write-Allocate Evasion. 350-360 - Yi-Chien Lin, Yuyang Chen, Sameh Gobriel, Nilesh Jain, Gopi Krishna Jha, Viktor K. Prasanna:

ARGO: An Auto-Tuning Runtime System for Scalable GNN Training on Multi-Core Processor. 361-372 - Yuke Li, Arjun Kashyap, Weicong Chen, Yanfei Guo, Xiaoyi Lu:

Accelerating Lossy and Lossless Compression on Emerging BlueField DPU Architectures. 373-385 - Tobias S. Flynn

, Robert Manson-Sawko, Gihan R. Mudalige
:
Performance-Portable Multiphase Flow Solutions with Discontinuous Galerkin Methods. 386-397 - Ahmed H. Mahmoud, Hesam Salehipour, Massimiliano Meneghin:

Optimized GPU Implementation of Grid Refinement in Lattice Boltzmann Method. 398-407 - Herbert Owen

, Dominik Ernst, Thomas Gruber, Oriol Lehmkuhl, Guillaume Houzeaux, Lucas Gasparino, Gerhard Wellein:
Alya towards Exascale: Optimal OpenACC Performance of the Navier-Stokes Finite Element Assembly on GPUs. 408-416 - Zizhe Jian, Sheng Di, Jinyang Liu

, Kai Zhao, Xin Liang, Haiying Xu, Robert Underwood
, Shixun Wu
, Jiajun Huang, Zizhong Chen, Franck Cappello:
CliZ: Optimizing Lossy Compression for Climate Datasets with Adaptive Fine-tuned Data Prediction. 417-429 - Eric Heisler, Siddharth Saurav, Aadesh Deshmukh, Sandip Mazumder, Hari Sundar:

Automating GPU Scalability for Complex Scientific Models: Phonon Boltzmann Transport Equation. 430-439 - Tianyu Liang, Chao Chen

, Per-Gunnar Martinsson, George Biros:
An O(N) distributed-memory parallel direct solver for planar integral equations. 440-452 - Marc Blancafort

, Roger Ferrer, Guillaume Houzeaux, Marta Garcia-Gasulla, Filippo Mantovani:
Exploiting long vectors with a CFD code: a co-design show case. 453-464 - Ahmad Tarraf, Alexis Bandet, Francieli Boito, Guillaume Pallez, Felix Wolf:

Capturing Periodic I/O Using Frequency Techniques. 465-478 - Anxin Guo, Jingwei Li, Pattara Sukprasert, Samir Khuller, Amol Deshpande, Koyel Mukherjee:

To Store or Not to Store: a graph theoretical approach for Dataset Versioning. 479-493 - Neeraj Rajesh, Keith Bateman, Jean Luca Bez

, Suren Byna
, Anthony Kougkas, Xian-He Sun:
TunIO: An AI-powered Framework for Optimizing HPC I/O. 494-505 - Dong Kyu Sung, Yongseok Son, Alex Sim

, Kesheng Wu
, Suren Byna
, Houjun Tang, Hyeonsang Eom, Changjong Kim, Sunggon Kim:
A2FL: Autonomous and Adaptive File Layout in HPC through Real-time Access Pattern Analysis. 506-518 - Darren Ng, Andrew Lin, Arjun Kashyap, Guanpeng Li, Xiaoyi Lu:

NVMe-oPF: Designing Efficient Priority Schemes for NVMe-over-Fabrics with Multi-Tenancy Support. 519-531 - Hammad Ather

, Jean Luca Bez
, Yankun Xia, Suren Byna
:
Drilling Down I/O Bottlenecks with Cross-layer I/O Profile Exploration. 532-543 - Mark Hildebrand, Jason Lowe-Power, Venkatesh Akella:

CachedArrays: Optimizing Data Movement for Heterogeneous Memory Systems. 545-555 - Junqi Yin, Avishek Bose, Guojing Cong, Isaac Lyngaas

, Quentin Anthony:
Comparative Study of Large Language Model Architectures on Frontier. 556-569 - Daniel Nichols, Alexander Movsesyan, Jae-Seung Yeom

, Abhik Sarkar, Daniel Milroy, Tapasya Patki
, Abhinav Bhatele:
Predicting Cross-Architecture Performance of Parallel Programs. 570-581 - Md Hasanur Rahman

, Sheng Di, Shengjian Guo, Xiaoyi Lu, Guanpeng Li, Franck Cappello:
Druto: Upper-Bounding Silent Data Corruption Vulnerability in GPU Applications. 582-594 - Jad El Karchi, Hanze Chen, Ali TehraniJamsaz, Ali Jannesari, Mihail Popov, Emmanuelle Saillard:

MPI Errors Detection using GNN Embedding and Vector Embedding over LLVM IR. 595-607 - Shuaipeng Zhang, Shiyi Li, Chentao Wu, Ruobin Wu, Saiqin Long, Wen Xia:

A Parallel Partial Merge Repair Algorithm for Multi-block Failures for Erasure Storage Systems. 608-618 - Payman Behnam, Uday Kamal, Ali Shafiee, Alexey Tumanov, Saibal Mukhopadhyay:

Harmonica: Hybrid Accelerator to Overcome Imperfections of Mixed-signal DNN Accelerators. 619-630 - Ricardo Nobre, Aleksandar Ilic

, Sergio Santander-Jiménez, Leonel Sousa:
IPU-EpiDet: Identifying Gene Interactions on Massively Parallel Graph-Based AI Accelerators. 631-643 - Malith Jayaweera, Yanyu Li, Yanzhi Wang, Bin Ren, David R. Kaeli:

DEFCON: Deformable Convolutions Leveraging Interval Search and GPU Texture Hardware. 644-655 - Weile Luo, Ruibo Fan, Zeyu Li, Dayou Du, Qiang Wang

, Xiaowen Chu:
Benchmarking and Dissecting the Nvidia Hopper GPU Architecture. 656-667 - Emanuele Del Sozzo, Xinyuan Wang, Boma A. Adhi

, Carlos Cortes, Jason Anderson, Kentaro Sano:
Exploration of Trade-offs Between General-Purpose and Specialized Processing Elements in HPC-Oriented CGRA. 668-680 - Abeda Sultana, Fei Xu, Xu Yuan, Li Chen, Nian-Feng Tzeng:

Hadar: Heterogeneity-Aware Optimization-Based Online Scheduling for Deep Learning Cluster. 681-691 - Chen Chen

, Xingbo Wu, Wenshao Zhong
, Jakob Eriksson:
Fast Abort-Freedom for Deterministic Transactions. 692-704 - Di Zhang, Monish Soundar Raj, Bing Xie, Sheng Di, Dong Dai:

Cross-System Analysis of Job Characterization and Scheduling in Large-Scale Computing Clusters. 716-727 - Srinjoy Das, Lawrence Rauchwerger:

Automatic Task Parallelization of Dataflow Graphs in ML/DL Models. 728-739 - Thomas B. Rolinger, Alan Sussman:

Adaptive Prefetching for Fine-grain Communication in PGAS Programs. 740-751 - Jiajun Huang, Sheng Di, Xiaodong Yu, Yujia Zhai, Zhaorui Zhang

, Jinyang Liu
, Xiaoyi Lu, Ken Raffenetti, Hui Zhou, Kai Zhao, Zizhong Chen, Franck Cappello, Yanfei Guo, Rajeev Thakur:
An Optimized Error-controlled MPI Collective Framework Integrated with Lossy Compression. 752-764 - Zijian Li

, Zixuan Chen
, Yiying Tang, Xin Ai
, Yuanyi Zhu, Zhigao Zhao, Jiang Shao, Guowei Liu, Sen Liu, Bin Liu, Yang Xu:
MUSE: A Runtime Incrementally Reconfigurable Network Adapting to HPC Real-Time Traffic. 765-779 - Zicheng Wang, Zirui Zhuang

, Jingyu Wang, Qi Qi, Haifeng Sun, Jianxin Liao:
Fast Policy Convergence for Traffic Engineering with Proactive Distributed Message-Passing. 780-790 - Chongshan Liang, Yi Dai, Jun Xia, Jinbo Xu, Jintao Peng, Weixia Xu, Ming Xie, Jie Liu, Zhiquan Lai, Sheng Ma, Qi Zhu:

The Self-adaptive and Topology-aware MPI_Bcast leveraging Collective offload on Tianhe Express Interconnect. 791-801 - Bharath Ramesh, Nick Contini, Nawras Alnaasan

, Kaushik Kandadi Suresh, Mustafa Abduljabbar, Aamir Shafi, Hari Subramoni, Dhabaleswar K. D. K. Panda:
HINT: Designing Cache-Efficient MPI_Alltoall using Hybrid Memory Copy Ordering and Non-Temporal Instructions. 802-813 - Tu Dinh Ngoc, Boris Teabe, Georges Da Costa, Daniel Hagimont:

Flexible NVMe Request Routing for Virtual Machines. 814-824 - Xiang Chen, Tao Lu, Jiapin Wang, Yu Zhong, Guangchun Xie, Xueming Cao, Yuanpeng Ma, Bing Si, Feng Ding, Ying Yang, Yunxin Huang, Yafei Yang, You Zhou, Fei Wu:

HA-CSD: Host and SSD Coordinated Compression for Capacity and Performance. 825-838 - Md Nahid Newaz, Sayan Ghosh, Joshua Suetterlein

, Nathan R. Tallent, Md Atiqul Mollah, Ming Hua:
Graph Analytics on Jellyfish topology. 839-851 - Sukarn Agarwal, Shounak Chakraborty

, Magnus Själander:
TEEMO: Temperature Aware Energy Efficient Multi-Retention STT-RAM Cache Architecture. 852-864 - Li Wan, Fu Chao, Qiang Li, Jun Han:

LockillerTM: Enhancing Performance Lower Bounds in Best-Effort Hardware Transactional Memory. 865-875 - Pengmiao Zhang, Neelesh Gupta

, Rajgopal Kannan, Viktor K. Prasanna:
Attention, Distillation, and Tabularization: Towards Practical Neural Network-Based Prefetching. 876-888 - Jiaqi Yang, Hao Zheng, Ahmed Louri:

Aurora: A Versatile and Flexible Accelerator for Graph Neural Networks. 890-902 - Lihan Hu, Jing Li, Peng Jiang:

cuKE: An Efficient Code Generator for Score Function Computation in Knowledge Graph Embedding. 903-914 - Jinghan Yao, Quentin Anthony, Aamir Shafi, Hari Subramoni, Dhabaleswar K. Panda:

Exploiting Inter-Layer Expert Affinity for Accelerating Mixture-of-Experts Model Inference. 915-925 - Gangda Deng, Hongkuan Zhou, Hanqing Zeng, Yinglong Xia, Christopher Leung, Jianbo Li, Rajgopal Kannan, Viktor K. Prasanna:

TASER: Temporal Adaptive Sampling for Fast and Accurate Dynamic Graph Representation Learning. 926-937 - Ruge Zhang, Haipeng Jia, Yunquan Zhang, Baicheng Yan, Penghao Ma, Long Wang, Wenxuan Zhao:

OpenFFT-SME: An Efficient Outer Product Pattern FFT Library on ARM SME CPUs. 938-949 - Evangelos Georganas, Dhiraj D. Kalamkar, Kirill Voronin, Abhisek Kundu, Antonio Noack, Hans Pabst, Alexander Breuer, Alexander Heinecke:

Harnessing Deep Learning and HPC Kernels via High-Level Loop and Tensor Abstractions on CPU Architectures. 950-963 - Kainan Yu, Xinxin Qi

, Peng Zhang, Jianbin Fang, Dezun Dong, Ruibo Wang, Tao Tang, Chun Huang, Yonggang Che, Zheng Wang:
Optimizing General Matrix Multiplications on Modern Multi-core DSPs. 964-975 - Yufan Xia

, Giuseppe Maria Junior Barca
:
Machine-Learning-Driven Runtime Optimization of BLAS Level 3 on Modern Multi-Core Systems. 976-986 - Debasish Pattanayak, Gokarna Sharma:

Time-Color Tradeoff on Uniform Circle Formation by Asynchronous Robots. 987-997 - Xiaohai Dai, Guanxiong Wang, Jiang Xiao, Zhengxuan Guo, Rui Hao, Xia Xie, Hai Jin:

LightDAG: A Low-latency DAG-based BFT Consensus through Lightweight Broadcast. 998-1008 - Rongyuan Tan, Zhuozhao Li:

MAAD: A Distributed Anomaly Detection Architecture for Microservices Systems. 1009-1021 - Jérémie Decouchant, David Kozhaya, Vincent Rahli, Jiangshan Yu

:
OneShot: View-Adapting Streamlined BFT Protocols with Trusted Execution Environments. 1022-1033 - Alexandre Valentin Jamet, Georgios Vavouliotis, Daniel A. Jiménez, Lluc Alvarez, Marc Casas:

Practically Tackling Memory Bottlenecks of Graph-Processing Workloads. 1034-1045 - Yihua Wei, Peng Jiang:

GCSM: GPU-Accelerated Continuous Subgraph Matching for Large Graphs. 1046-1057 - Sam Coy, Artur Czumaj, Peter Davies-Peck, Gopinath Mishra:

Parallel Derandomization for Coloring. 1058-1069 - Jiangbo Li, Zichen Xu, Minh Pham, Yicheng Tu, Qihe Zhou:

A Comparative Study of Intersection-Based Triangle Counting Algorithms on GPUs. 1070-1081

manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.


Google
Google Scholar
Semantic Scholar
Internet Archive Scholar
CiteSeerX
ORCID














