default search action
Dhabaleswar K. Panda 0001
Dhabaleswar K. D. K. Panda – Dhabaleswar Kumar Panda 0001
Person information
- affiliation: Ohio State University, Columbus, USA
Refine list
refinements active!
zoomed in on ?? of ?? records
view refined list in
export refined list as
showing all ?? records
2020 – today
- 2024
- [j65]Dhabaleswar K. Panda, Vipin Chaudhary, Eric Fosler-Lussier, Raghu Machiraju, Amit Majumdar, Beth Plale, Rajiv Ramnath, Ponnuswamy Sadayappan, Neelima Savardekar, Karen Tomko:
Creating intelligent cyberinfrastructure for democratizing AI. AI Mag. 45(1): 22-28 (2024) - [j64]Tu Tran, Bharath Ramesh, Benjamin Michalowicz, Mustafa Abduljabbar, Hari Subramoni, Aamir Shafi, Dhabaleswar K. Panda:
Accelerating communication with multi-HCA aware collectives in MPI. Concurr. Comput. Pract. Exp. 36(1) (2024) - [c515]Lang Xu, Quentin Anthony, Qinghua Zhou, Nawras Alnaasan, Radha Gulhane, Aamir Shafi, Hari Subramoni, Dhabaleswar K. D. K. Panda:
Accelerating Large Language Model Training with Hybrid GPU-based Compression. CCGrid 2024: 196-205 - [c514]Nawras Alnaasan, Horng-Ruey Huang, Aamir Shafi, Hari Subramoni, Dhabaleswar K. Panda:
Characterizing Communication in Distributed Parameter-Efficient Fine-Tuning for Large Language Models. HOTI 2024: 11-19 - [c513]Tu Tran, Goutham Kalikrishna Reddy Kuncham, Bharath Ramesh, Shulei Xu, Hari Subramoni, Mustafa Abduljabbar, Dhabaleswar K. Panda:
OHIO: Improving RDMA Network Scalability in MPI_Alltoall Through Optimized Hierarchical and Intra/Inter-Node Communication Overlap Design. HOTI 2024: 47-56 - [c512]Quentin Anthony, Benjamin Michalowicz, Jacob Hatef, Lang Xu, Mustafa Abdul Jabbar, Aamir Shafi, Hari Subramoni, Dhabaleswar K. Panda:
Demystifying the Communication Characteristics for Distributed Transformer Models. HOTI 2024: 57-65 - [c511]Quentin Anthony, Jacob Hatef, Deepak Narayanan, Stella Biderman, Stas Bekman, Junqi Yin, Aamir Shafi, Hari Subramoni, Dhabaleswar K. Panda:
The Case for Co-Designing Model Architectures with Hardware. ICPP 2024: 84-96 - [c510]Dhabaleswar K. Panda, Hari Subramoni:
Message from the HCW 2024 Technical Program Committee Co-Chairs. IPDPS (Workshops) 2024: 1 - [c509]Dhabaleswar K. Panda, Hari Subramoni:
Message from the HCW 2024 Technical Program Committee Co-Chairs. IPDPS (Workshops) 2024: 4 - [c508]HooYoung Ahn, SeonYoung Kim, Yoo-Mi Park, Woojong Han, Nick Contini, Bharath Ramesh, Mustafa Abduljabbar, Dhabaleswar K. Panda:
Towards Accelerating k-NN with MPI and Near-Memory Processing. IPDPS (Workshops) 2024: 608-615 - [c507]Mingzhe Han, Goutham Kalikrishna Reddy Kuncham, Benjamin Michalowicz, Rahul Vaidya, Mustafa Abduljabbar, Aamir Shafi, Hari Subramoni, Dhabaleswar K. D. K. Panda:
PML-MPI: A Pre-Trained ML Framework for Efficient Collective Algorithm Selection in MPI. IPDPS (Workshops) 2024: 761-770 - [c506]Bharath Ramesh, Nick Contini, Nawras Alnaasan, Kaushik Kandadi Suresh, Mustafa Abduljabbar, Aamir Shafi, Hari Subramoni, Dhabaleswar K. D. K. Panda:
HINT: Designing Cache-Efficient MPI_Alltoall using Hybrid Memory Copy Ordering and Non-Temporal Instructions. IPDPS 2024: 802-813 - [c505]Jinghan Yao, Quentin Anthony, Aamir Shafi, Hari Subramoni, Dhabaleswar K. Panda:
Exploiting Inter-Layer Expert Affinity for Accelerating Mixture-of-Experts Model Inference. IPDPS 2024: 915-925 - [c504]Qinghua Zhou, Bharath Ramesh, Aamir Shafi, Mustafa Abduljabbar, Hari Subramoni, Dhabaleswar K. Panda:
Accelerating MPI AllReduce Communication with Efficient GPU-Based Compression Schemes on Modern GPU Clusters. ISC 2024: 1-12 - [c503]Nicholas Contini, Mustafa Abduljabbar, Hari Subramoni, Dhabaleswar K. Panda:
OMB-FPGA: A Microbenchmark Suite for FPGA-aware MPIs using OpenCL and SYCL. PEARC 2024: 1:1-1:9 - [c502]Radha Gulhane, Quentin Anthony, Aamir Shafi, Hari Subramoni, Dhabaleswar K. Panda:
Infer-HiRes: Accelerating Inference for High-Resolution Images with Quantization and Distributed Deep Learning. PEARC 2024: 5:1-5:9 - [c501]Chen-Chun Chen, Goutham Kalikrishna Reddy Kuncham, Pouya Kousha, Hari Subramoni, Dhabaleswar K. Panda:
Design and Implementation of an IPC-based Collective MPI Library for Intel GPUs. PEARC 2024: 17:1-17:9 - [c500]Tu Tran, Mustafa Abduljabbar, HooYoung Ahn, SeonYoung Kim, Yoo-Mi Park, Woojong Han, Shin-Young Ahn, Hari Subramoni, Dhabaleswar K. Panda:
OMB-CXL: A Micro-Benchmark Suite for Evaluating MPI Communication Utilizing Compute Express Link Memory Devices. PEARC 2024: 27:1-27:8 - [i19]Jinghan Yao, Quentin Anthony, Aamir Shafi, Hari Subramoni, Dhabaleswar K. Panda:
Exploiting Inter-Layer Expert Affinity for Accelerating Mixture-of-Experts Model Inference. CoRR abs/2401.08383 (2024) - [i18]Quentin Anthony, Jacob Hatef, Deepak Narayanan, Stella Biderman, Stas Bekman, Junqi Yin, Aamir Shafi, Hari Subramoni, Dhabaleswar K. Panda:
The Case for Co-Designing Model Architectures with Hardware. CoRR abs/2401.14489 (2024) - [i17]Quentin Anthony, Benjamin Michalowicz, Jacob Hatef, Lang Xu, Mustafa Abduljabbar, Aamir Shafi, Hari Subramoni, Dhabaleswar K. Panda:
Demystifying the Communication Characteristics for Distributed Transformer Models. CoRR abs/2408.10197 (2024) - [i16]Jinghan Yao, Sam Ade Jacobs, Masahiro Tanaka, Olatunji Ruwase, Aamir Shafi, Hari Subramoni, Dhabaleswar K. Panda:
Training Ultra Long Context Language Model with Fully Pipelined Distributed Transformer. CoRR abs/2408.16978 (2024) - [i15]Lang Xu, Quentin Anthony, Qinghua Zhou, Nawras Alnaasan, Radha Gulhane, Aamir Shafi, Hari Subramoni, Dhabaleswar K. Panda:
Accelerating Large Language Model Training with Hybrid GPU-based Compression. CoRR abs/2409.02423 (2024) - 2023
- [j63]Kawthar Shafie Khorassani, Chen-Chun Chen, Bharath Ramesh, Aamir Shafi, Hari Subramoni, Dhabaleswar K. Panda:
High Performance MPI over the Slingshot Interconnect. J. Comput. Sci. Technol. 38(1): 128-145 (2023) - [j62]Kaushik Kandadi Suresh, Kawthar Shafie Khorassani, Chen-Chun Chen, Bharath Ramesh, Mustafa Abduljabbar, Aamir Shafi, Hari Subramoni, Dhabaleswar K. Panda:
Network-Assisted Noncontiguous Transfers for GPU-Aware MPI Libraries. IEEE Micro 43(2): 131-139 (2023) - [c499]Pouya Kousha, Qinghua Zhou, Hari Subramoni, Dhabaleswar K. Panda:
Benchmarking Modern Databases for Storing and Profiling Very Large Scale HPC Communication Data. Bench 2023: 104-119 - [c498]Nawras Alnaasan, Matthew Lieber, Aamir Shafi, Hari Subramoni, Scott A. Shearer, Dhabaleswar K. Panda:
HARVEST: High-Performance Artificial Vision Framework for Expert Labeling using Semi-Supervised Training. IEEE Big Data 2023: 139-148 - [c497]Kinan Al-Attar, Aamir Shafi, Hari Subramoni, Dhabaleswar K. Panda:
MPI4Spark Meets YARN: Enhancing MPI4Spark through YARN support for HPC. IEEE Big Data 2023: 2265-2274 - [c496]Chen-Chun Chen, Kawthar Shafie Khorassani, Goutham Kalikrishna Reddy Kuncham, Rahul Vaidya, Mustafa Abduljabbar, Aamir Shafi, Hari Subramoni, Dhabaleswar K. Panda:
Implementing and Optimizing a GPU-aware MPI Library for Intel GPUs: Early Experiences. CCGrid 2023: 131-140 - [c495]Quentin Anthony, Lang Xu, Aamir Shafi, Hari Subramoni, Dhabaleswar K. Panda:
ScaMP: Scalable Meta-Parallelism for Deep Learning Search. CCGridW 2023: 346-348 - [c494]Quentin Anthony, Lang Xu, Aamir Shafi, Hari Subramoni, Dhabaleswar K. Panda:
ScaMP: Scalable Meta-Parallelism for Deep Learning Search. CCGrid 2023: 391-402 - [c493]Dhabaleswar K. D. K. Panda:
How to Educate HPC-Enabled AI and Data Science to Students and Professionals in a Holistic Manner? HiPCW 2023: 4 - [c492]Shulei Xu, Goutham Kalikrishna Reddy Kuncham, Mustafa Abduljabbar, Hari Subramoni, Dhabaleswar K. D. K. Panda:
Optimized All-to-All Connection Establishment for High-Performance MPI Libraries Over InfiniBand. HiPC 2023: 41-50 - [c491]Jinghan Yao, Nawras Alnaasan, Tian Chen, Aamir Shafi, Hari Subramoni, Dhabaleswar K. D. K. Panda:
Flover: A Temporal Fusion Framework for Efficient Autoregressive Model Parallel Inference. HiPC 2023: 107-116 - [c490]Bharath Ramesh, Goutham Kalikrishna Reddy Kuncham, Kaushik Kandadi Suresh, Rahul Vaidya, Nawras Alnaasan, Mustafa Abduljabbar, Aamir Shafi, Hari Subramoni, Dhabaleswar K. D. K. Panda:
Designing In-network Computing Aware Reduction Collectives in MPI. HOTI 2023: 25-32 - [c489]Benjamin Michalowicz, Kaushik Kandadi Suresh, Hari Subramoni, Dhabaleswar K. D. K. Panda, Stephen W. Poole:
Battle of the BlueFields: An In-Depth Comparison of the BlueField-2 and BlueField-3 SmartNICs. HOTI 2023: 41-48 - [c488]Hyunho Ahn, Tian Chen, Nawras Alnaasan, Aamir Shafi, Mustafa Abduljabbar, Hari Subramoni, Dhabaleswar K. Panda:
Performance Characterization of Using Quantization for DNN Inference on Edge Devices. ICFEC 2023: 1-6 - [c487]Nicholas Contini, Bharath Ramesh, Kaushik Kandadi Suresh, Tu Tran, Benjamin Michalowicz, Mustafa Abduljabbar, Hari Subramoni, Dhabaleswar K. Panda:
Enabling Reconfigurable HPC through MPI-based Inter-FPGA Communication. ICS 2023: 477-487 - [c486]Kaushik Kandadi Suresh, Benjamin Michalowicz, Bharath Ramesh, Nicholas Contini, Jinghan Yao, Shulei Xu, Aamir Shafi, Hari Subramoni, Dhabaleswar K. Panda:
A Novel Framework for Efficient Offloading of Communication Operations to Bluefield SmartNICs. IPDPS 2023: 123-133 - [c485]Qinghua Zhou, Quentin Anthony, Lang Xu, Aamir Shafi, Mustafa Abduljabbar, Hari Subramoni, Dhabaleswar K. Panda:
Accelerating Distributed Deep Learning Training with Compression Assisted Allgather and Reduce-Scatter Communication. IPDPS 2023: 134-144 - [c484]Benjamin Michalowicz, Kaushik Kandadi Suresh, Bharath Ramesh, Aamir Shafi, Hari Subramoni, Mustafa Abduljabbar, Dhabaleswar K. Panda:
In-Depth Evaluation of a Lower-Level Direct-Verbs API on InfiniBand-based Clusters: Early Experiences. IPDPS Workshops 2023: 354-363 - [c483]Kawthar Shafie Khorassani, Chen-Chun Chen, Hari Subramoni, Dhabaleswar K. Panda:
Designing and Optimizing GPU-aware Nonblocking MPI Neighborhood Collective Communication for PETSc*. IPDPS 2023: 646-656 - [c482]Quentin Anthony, Ammar Ahmad Awan, Jeff Rasley, Yuxiong He, Aamir Shafi, Mustafa Abduljabbar, Hari Subramoni, Dhabaleswar K. Panda:
MCR-DL: Mix-and-Match Communication Runtime for Deep Learning. IPDPS 2023: 996-1006 - [c481]Pouya Kousha, Vivekananda Sathu, Matthew Lieber, Hari Subramoni, Dhabaleswar K. Panda:
Democratizing HPC Access and Use with Knowledge Graphs. SC Workshops 2023: 242-251 - [c480]Chen-Chun Chen, Kawthar Shafie Khorassani, Pouya Kousha, Qinghua Zhou, Jinghan Yao, Hari Subramoni, Dhabaleswar K. Panda:
MPI-xCCL: A Portable MPI Library over Collective Communication Libraries for Various Accelerators. SC Workshops 2023: 847-854 - [c479]Pouya Kousha, Arpan Jain, Ayyappa Kolli, Matthew Lieber, Mingzhe Han, Nicholas Contini, Hari Subramoni, Dhabaleswar K. Panda:
SAI: AI-Enabled Speech Assistant Interface for Science Gateways in HPC. ISC 2023: 402-424 - [c478]Benjamin Michalowicz, Kaushik Kandadi Suresh, Hari Subramoni, Dhabaleswar K. Panda, Steve Poole:
DPU-Bench: A Micro-Benchmark Suite to Measure Offload Efficiency Of SmartNICs. PEARC 2023: 94-101 - [c477]Samuel Khuvis, Karen Tomko, Scott R. Brozell, Chen-Chun Chen, Hari Subramoni, Dhabaleswar K. Panda:
Optimizing Amber for Device-to-Device GPU Communication. PEARC 2023: 200-205 - [i14]Hyunho Ahn, Tian Chen, Nawras Alnaasan, Aamir Shafi, Mustafa Abduljabbar, Hari Subramoni, Dhabaleswar K. Panda:
Performance Characterization of using Quantization for DNN Inference on Edge Devices: Extended Version. CoRR abs/2303.05016 (2023) - [i13]Quentin Anthony, Ammar Ahmad Awan, Jeff Rasley, Yuxiong He, Aamir Shafi, Mustafa Abduljabbar, Hari Subramoni, Dhabaleswar K. Panda:
MCR-DL: Mix-and-Match Communication Runtime for Deep Learning. CoRR abs/2303.08374 (2023) - [i12]Jinghan Yao, Nawras Alnaasan, Tian Chen, Aamir Shafi, Hari Subramoni, Dhabaleswar K. Panda:
Flover: A Temporal Fusion Framework for Efficient Autoregressive Model Parallel Inference. CoRR abs/2305.13484 (2023) - 2022
- [j61]Arpan Jain, Nawras Alnaasan, Aamir Shafi, Hari Subramoni, Dhabaleswar K. Panda:
Optimizing Distributed DNN Training Using CPUs and BlueField-2 DPUs. IEEE Micro 42(2): 53-60 (2022) - [c476]Kinan Al-Attar, Aamir Shafi, Mustafa Abduljabbar, Hari Subramoni, Dhabaleswar K. Panda:
Spark Meets MPI: Towards High-Performance Communication Framework for Spark using MPI. CLUSTER 2022: 71-81 - [c475]Apan Qasem, Hartwig Anzt, Eduard Ayguadé, Katharine J. Cahill, Ramon Canal, Jany Chan, Eric Fosler-Lussier, Fritz Göbel, Arpan Jain, Marcel Koch, Mateusz Kuzak, Josep Llosa, Raghu Machiraju, Xavier Martorell, Pratik Nayak, Shameema Oottikkal, Marcin Ostasz, Dhabaleswar K. Panda, Dirk Pleiter, Rajiv Ramnath, Maria-Ribera Sancho, Alessio Sclocco, Aamir Shafi, Hanno Spreeuw, Hari Subramoni, Karen Tomko:
Lightning Talks of EduHPC 2022. EduHPC@SC 2022: 42-49 - [c474]Qinghua Zhou, Quentin Anthony, Aamir Shafi, Hari Subramoni, Dhabaleswar K. Panda:
Accelerating Broadcast Communication with GPU Compression for Deep Learning Workloads. HIPC 2022: 22-31 - [c473]Nawras Alnaasan, Arpan Jain, Aamir Shafi, Hari Subramoni, Dhabaleswar K. Panda:
AccDP: Accelerated Data-Parallel Distributed DNN Training for Modern GPU-Based HPC Clusters. HIPC 2022: 32-41 - [c472]Bharath Ramesh, Qinghua Zhou, Aamir Shafi, Mustafa Abduljabbar, Hari Subramoni, Dhabaleswar K. Panda:
Designing Efficient Pipelined Communication Schemes using Compression in MPI Libraries. HIPC 2022: 95-99 - [c471]Kaushik Kandadi Suresh, Akshay Paniraja Guptha, Benjamin Michalowicz, Bharath Ramesh, Mustafa Abduljabbar, Aamir Shafi, Hari Subramoni, Dhabaleswar K. Panda:
Efficient Personalized and Non-Personalized Alltoall Communication for Modern Multi-HCA GPU-Based Clusters. HIPC 2022: 100-104 - [c470]Kaushik Kandadi Suresh, Kawthar Shafie Khorassani, Chen-Chun Chen, Bharath Ramesh, Mustafa Abduljabbar, Aamir Shafi, Hari Subramoni, Dhabaleswar K. Panda:
Network Assisted Non-Contiguous Transfers for GPU-Aware MPI Libraries. HOTI 2022: 13-20 - [c469]Tu Tran, Benjamin Michalowicz, Bharath Ramesh, Hari Subramoni, Aamir Shafi, Dhabaleswar K. Panda:
Designing Hierarchical Multi-HCA Aware Allgather in MPI. ICPP Workshops 2022: 28:1-28:10 - [c468]Dhabaleswar K. Panda:
Challenges and Opportunities in Designing High-Performance and Scalable Middleware for HPC and AI: Past, Present, and Future. IPDPS 2022: 1 - [c467]Chen-Chun Chen, Kawthar Shafie Khorassani, Quentin G. Anthony, Aamir Shafi, Hari Subramoni, Dhabaleswar K. Panda:
Highly Efficient Alltoall and Alltoallv Communication Algorithms for GPU Systems. IPDPS Workshops 2022: 24-33 - [c466]Shulei Xu, Aamir Shafi, Hari Subramoni, Dhabaleswar K. Panda:
Arm meets Cloud: A Case Study of MPI Library Performance on AWS Arm-based HPC Cloud with Elastic Fabric Adapter. IPDPS Workshops 2022: 449-456 - [c465]Kinan Al-Attar, Aamir Shafi, Hari Subramoni, Dhabaleswar K. Panda:
Towards Java-based HPC using the MVAPICH2 Library: Early Experiences. IPDPS Workshops 2022: 510-519 - [c464]Nawras Alnaasan, Arpan Jain, Aamir Shafi, Hari Subramoni, Dhabaleswar K. Panda:
OMB-Py: Python Micro-Benchmarks for Evaluating Performance of MPI Libraries on HPC Systems. IPDPS Workshops 2022: 870-879 - [c463]Qinghua Zhou, Pouya Kousha, Quentin Anthony, Kawthar Shafie Khorassani, Aamir Shafi, Hari Subramoni, Dhabaleswar K. Panda:
Accelerating MPI All-to-All Communication with Online Compression on Modern GPU Clusters. ISC 2022: 3-25 - [c462]Pouya Kousha, Arpan Jain, Ayyappa Kolli, Prasanna Sainath, Hari Subramoni, Aamir Shafi, Dhabaleswar K. Panda:
"Hey CAI" - Conversational AI Enabled User Interface for HPC Tools. ISC 2022: 87-108 - [c461]Arpan Jain, Aamir Shafi, Quentin Anthony, Pouya Kousha, Hari Subramoni, Dhabaleswar K. Panda:
Hy-Fi: Hybrid Five-Dimensional Parallel DNN Training on High-Performance GPU Clusters. ISC 2022: 109-130 - [c460]Kawthar Shafie Khorassani, Chen-Chun Chen, Bharath Ramesh, Aamir Shafi, Hari Subramoni, Dhabaleswar K. Panda:
High Performance MPI over the Slingshot Interconnect: Early Experiences. PEARC 2022: 15:1-15:7 - [e8]Dhabaleswar K. Panda, Michael B. Sullivan:
Supercomputing Frontiers - 7th Asian Conference, SCFA 2022, Singapore, March 1-3, 2022, Proceedings. Lecture Notes in Computer Science 13214, Springer 2022, ISBN 978-3-031-10418-3 [contents] - 2021
- [j60]Dhabaleswar Kumar Panda, Hari Subramoni, Ching-Hsiang Chu, Mohammadreza Bayatpour:
The MVAPICH project: Transforming research into high-performance MPI library for HPC community. J. Comput. Sci. 52: 101208 (2021) - [c459]Kawthar Shafie Khorassani, Ching-Hsiang Chu, Quentin G. Anthony, Hari Subramoni, Dhabaleswar K. Panda:
Adaptive and Hierarchical Large Message All-to-all Communication Algorithms for Large-scale Dense GPU Systems. CCGRID 2021: 113-122 - [c458]Aamir Shafi, Jahanzeb Maqbool Hashmi, Hari Subramoni, Dhabaleswar K. D. K. Panda:
Efficient MPI-based Communication for GPU-Accelerated Dask Applications. CCGRID 2021: 277-286 - [c457]Bharath Ramesh, Jahanzeb Maqbool Hashmi, Shulei Xu, Aamir Shafi, Seyedeh Mahdieh Ghazimirsaeed, Mohammadreza Bayatpour, Hari Subramoni, Dhabaleswar K. Panda:
Towards Architecture-aware Hierarchical Communication Trees on Modern HPC Systems. HiPC 2021: 272-281 - [c456]Yuntian He, Saket Gurukar, Pouya Kousha, Hari Subramoni, Dhabaleswar K. Panda, Srinivasan Parthasarathy:
DistMILE: A Distributed Multi-Level Framework for Scalable Graph Embedding. HiPC 2021: 282-291 - [c455]Kaushik Kandadi Suresh, Bharath Ramesh, Chen-Chun Chen, Seyedeh Mahdieh Ghazimirsaeed, Mohammadreza Bayatpour, Aamir Shafi, Hari Subramoni, Dhabaleswar K. Panda:
Layout-aware Hardware-assisted Designs for Derived Data Types in MPI. HiPC 2021: 302-311 - [c454]Nick Sarkauskas, Mohammadreza Bayatpour, Tu Tran, Bharath Ramesh, Hari Subramoni, Dhabaleswar K. Panda:
Large-Message Nonblocking MPI_Iallgather and MPI Ibcast Offload via BlueField-2 DPU. HiPC 2021: 388-393 - [c453]Arpan Jain, Nawras Alnaasan, Aamir Shafi, Hari Subramoni, Dhabaleswar K. Panda:
Accelerating CPU-based Distributed DNN Training on Modern HPC Clusters using BlueField-2 DPUs. HOTI 2021: 17-24 - [c452]Q. Zhou, C. Chu, N. S. Kumar, Pouya Kousha, Seyedeh Mahdieh Ghazimirsaeed, Hari Subramoni, Dhabaleswar K. Panda:
Designing High-Performance MPI Libraries with On-the-fly Compression for Modern GPU Clusters*. IPDPS 2021: 444-453 - [c451]Arpan Jain, Tim Moon, Tom Benson, Hari Subramoni, Sam Adé Jacobs, Dhabaleswar K. Panda, Brian Van Essen:
SUPER: SUb-Graph Parallelism for TransformERs. IPDPS 2021: 629-638 - [c450]Quentin Anthony, Lang Xu, Hari Subramoni, Dhabaleswar K. D. K. Panda:
Scaling Single-Image Super-Resolution Training on Modern HPC Clusters: Early Experiences. IPDPS Workshops 2021: 923-932 - [c449]Mohammadreza Bayatpour, Nick Sarkauskas, Hari Subramoni, Jahanzeb Maqbool Hashmi, Dhabaleswar K. Panda:
BluesMPI: Efficient MPI Non-blocking Alltoall Offloading Designs on Modern BlueField Smart NICs. ISC 2021: 18-37 - [c448]Kawthar Shafie Khorassani, Jahanzeb Maqbool Hashmi, Ching-Hsiang Chu, Chen-Chun Chen, Hari Subramoni, Dhabaleswar K. Panda:
Designing a ROCm-Aware MPI Library for AMD GPUs: Early Experiences. ISC 2021: 118-136 - [c447]Pouya Kousha, Kamal Raj Sankarapandian Dayala Ganesh Ram, Mansa Kedia, Hari Subramoni, Arpan Jain, Aamir Shafi, Dhabaleswar K. Panda, Trey Dockendorf, Heechang Na, Karen Tomko:
INAM: Cross-stack Profiling and Analysis of Communication in MPI-based Applications. PEARC 2021: 14:1-14:11 - [i11]Aamir Shafi, Jahanzeb Maqbool Hashmi, Hari Subramoni, Dhabaleswar K. Panda:
Efficient MPI-based Communication for GPU-Accelerated Dask Applications. CoRR abs/2101.08878 (2021) - [i10]Pouya Kousha, Quentin Anthony, Hari Subramoni, Dhabaleswar K. Panda:
Cross-layer Visualization and Profiling of Network and I/O Communication for HPC Clusters. CoRR abs/2109.08329 (2021) - [i9]Nawras Alnaasan, Arpan Jain, Aamir Shafi, Hari Subramoni, Dhabaleswar K. Panda:
OMB-Py: Python Micro-Benchmarks for Evaluating Performance of MPI Libraries on HPC Systems. CoRR abs/2110.10659 (2021) - 2020
- [j59]Sourav Chakraborty, Ignacio Laguna, Murali Emani, Kathryn M. Mohror, Dhabaleswar K. Panda, Martin Schulz, Hari Subramoni:
EReinit: Scalable and efficient fault-tolerance for bulk-synchronous MPI applications. Concurr. Comput. Pract. Exp. 32(3) (2020) - [j58]Jahanzeb Maqbool Hashmi, Ching-Hsiang Chu, Sourav Chakraborty, Mohammadreza Bayatpour, Hari Subramoni, Dhabaleswar K. Panda:
FALCON-X: Zero-copy MPI derived datatype processing on modern CPU and GPU architectures. J. Parallel Distributed Comput. 144: 1-13 (2020) - [j57]Ammar Ahmad Awan, Arpan Jain, Ching-Hsiang Chu, Hari Subramoni, Dhabaleswar K. Panda:
Communication Profiling and Characterization of Deep-Learning Workloads on Clusters With High-Performance Interconnects. IEEE Micro 40(1): 35-43 (2020) - [c446]Mohammadreza Bayatpour, Seyedeh Mahdieh Ghazimirsaeed, Shulei Xu, Hari Subramoni, Dhabaleswar K. Panda:
Design and Characterization of InfiniBand Hardware Tag Matching in MPI. CCGRID 2020: 101-110 - [c445]Ching-Hsiang Chu, Kawthar Shafie Khorassani, Qinghua Zhou, Hari Subramoni, Dhabaleswar K. Panda:
Dynamic Kernel Fusion for Bulk Non-contiguous Data Transfer on GPU Clusters. CLUSTER 2020: 130-141 - [c444]Aamir Shafi, Jahanzeb Maqbool Hashmi, Hari Subramoni, Dhabaleswar K. Panda:
Blink: Towards Efficient RDMA-based Communication Coroutines for Parallel Python Applications. HiPC 2020: 111-120 - [c443]Ching-Hsiang Chu, Pouya Kousha, Ammar Ahmad Awan, Kawthar Shafie Khorassani, Hari Subramoni, Dhabaleswar K. D. K. Panda:
NV-group: link-efficient reduction for distributed deep learning on modern dense GPU systems. ICS 2020: 6:1-6:12 - [c442]Jahanzeb Maqbool Hashmi, Shulei Xu, Bharath Ramesh, Mohammadreza Bayatpour, Hari Subramoni, Dhabaleswar K. D. K. Panda:
Machine-agnostic and Communication-aware Designs for MPI on Emerging Architectures. IPDPS 2020: 32-41 - [c441]Amit Ruhela, Shulei Xu, Karthik Vadambacheri Manian, Hari Subramoni, Dhabaleswar K. Panda:
Analyzing and Understanding the Impact of Interconnect Performance on HPC, Big Data, and Deep Learning Applications: A Case Study with InfiniBand EDR and HDR. IPDPS Workshops 2020: 869-878 - [c440]Kaushik Kandadi Suresh, Bharath Ramesh, Seyedeh Mahdieh Ghazimirsaeed, Mohammadreza Bayatpour, Jahanzeb Maqbool Hashmi, Hari Subramoni, Dhabaleswar K. D. K. Panda:
Performance Characterization of Network Mechanisms for Non-Contiguous Data Transfers in MPI. IPDPS Workshops 2020: 896-905 - [c439]Quentin Anthony, Ammar Ahmad Awan, Arpan Jain, Hari Subramoni, Dhabaleswar K. D. K. Panda:
Efficient Training of Semantic Image Segmentation on Summit using Horovod and MVAPICH2-GDR. IPDPS Workshops 2020: 1015-1023 - [c438]Bharath Ramesh, Kaushik Kandadi Suresh, Nick Sarkauskas, Mohammadreza Bayatpour, Jahanzeb Maqbool Hashmi, Hari Subramoni, Dhabaleswar K. Panda:
Scalable MPI Collectives using SHARP: Large Scale Performance Evaluation on the TACC Frontera System. ExaMPI@SC 2020: 11-20 - [c437]Seyedeh Mahdieh Ghazimirsaeed, Quentin Anthony, Aamir Shafi, Hari Subramoni, Dhabaleswar K. D. K. Panda:
Accelerating GPU-based Machine Learning in Python using MPI Library: A Case Study with MVAPICH2-GDR. MLHPC/AI4S@SC 2020: 17-28 - [c436]Shulei Xu, Seyedeh Mahdieh Ghazimirsaeed, Jahanzeb Maqbool Hashmi, Hari Subramoni, Dhabaleswar K. Panda:
MPI Meets Cloud: Case Study with Amazon EC2 and Microsoft Azure. IPDRM@SC 2020: 41-48 - [c435]Arpan Jain, Ammar Ahmad Awan, Asmaa M. Aljuhani, Jahanzeb Maqbool Hashmi, Quentin G. Anthony, Hari Subramoni, Dhabaleswar K. Panda, Raghu Machiraju, Anil Parwani:
GEMS: GPU-enabled memory-aware model-parallelism system for distributed DNN training. SC 2020: 45 - [c434]Samuel Khuvis, Karen Tomko, Jahanzeb Maqbool Hashmi, Dhabaleswar K. Panda:
Exploring Hybrid MPI+Kokkos Tasks Programming Model. PAW-ATM@SC 2020: 66-73 - [c433]Ammar Ahmad Awan, Arpan Jain, Quentin Anthony, Hari Subramoni, Dhabaleswar K. Panda:
HyPar-Flow: Exploiting MPI and Keras for Scalable Hybrid-Parallel DNN Training with TensorFlow. ISC 2020: 83-103 - [c432]Mohammadreza Bayatpour, Jahanzeb Maqbool Hashmi, Sourav Chakraborty, Kaushik Kandadi Suresh, Seyedeh Mahdieh Ghazimirsaeed, Bharath Ramesh, Hari Subramoni, Dhabaleswar K. Panda:
Communication-Aware Hardware-Assisted MPI Overlap Engine. ISC 2020: 517-535 - [c431]