


Остановите войну!
for scientists:


default search action
Hari Subramoni
Person information

Refine list

refinements active!
zoomed in on ?? of ?? records
view refined list in
export refined list as
showing all ?? records
2020 – today
- 2023
- [j16]Kawthar Shafie Khorassani, Chen-Chun Chen, Bharath Ramesh, Aamir Shafi, Hari Subramoni, Dhabaleswar K. Panda:
High Performance MPI over the Slingshot Interconnect. J. Comput. Sci. Technol. 38(1): 128-145 (2023) - [j15]Kaushik Kandadi Suresh
, Kawthar Shafie Khorassani, Chen-Chun Chen, Bharath Ramesh, Mustafa Abduljabbar, Aamir Shafi, Hari Subramoni, Dhabaleswar K. Panda:
Network-Assisted Noncontiguous Transfers for GPU-Aware MPI Libraries. IEEE Micro 43(2): 131-139 (2023) - [c133]Pouya Kousha, Arpan Jain, Ayyappa Kolli, Matthew Lieber, Mingzhe Han, Nicholas Contini, Hari Subramoni, Dhabaleswar K. Panda:
SAI: AI-Enabled Speech Assistant Interface for Science Gateways in HPC. ISC 2023: 402-424 - [i9]Hyunho Ahn, Tian Chen, Nawras Alnaasan, Aamir Shafi, Mustafa Abduljabbar, Hari Subramoni, Dhabaleswar K. Panda:
Performance Characterization of using Quantization for DNN Inference on Edge Devices: Extended Version. CoRR abs/2303.05016 (2023) - [i8]Quentin Anthony, Ammar Ahmad Awan, Jeff Rasley, Yuxiong He, Aamir Shafi, Mustafa Abduljabbar, Hari Subramoni, Dhabaleswar K. Panda:
MCR-DL: Mix-and-Match Communication Runtime for Deep Learning. CoRR abs/2303.08374 (2023) - 2022
- [j14]Arpan Jain
, Nawras Alnaasan
, Aamir Shafi
, Hari Subramoni
, Dhabaleswar K. Panda:
Optimizing Distributed DNN Training Using CPUs and BlueField-2 DPUs. IEEE Micro 42(2): 53-60 (2022) - [c132]Kinan Al-Attar, Aamir Shafi, Mustafa Abduljabbar
, Hari Subramoni, Dhabaleswar K. Panda:
Spark Meets MPI: Towards High-Performance Communication Framework for Spark using MPI. CLUSTER 2022: 71-81 - [c131]Apan Qasem, Hartwig Anzt, Eduard Ayguadé, Katharine Cahill, Ramon Canal, Jany Chan
, Eric Fosler-Lussier, Fritz Göbel, Arpan Jain, Marcel Koch, Mateusz Kuzak, Josep Llosa, Raghu Machiraju, Xavier Martorell, Pratik Nayak, Shameema Oottikkal, Marcin Ostasz, Dhabaleswar K. Panda, Dirk Pleiter, Rajiv Ramnath, Maria-Ribera Sancho, Alessio Sclocco, Aamir Shafi, Hanno Spreeuw, Hari Subramoni, Karen Tomko:
Lightning Talks of EduHPC 2022. EduHPC@SC 2022: 42-49 - [c130]Qinghua Zhou, Quentin Anthony, Aamir Shafi, Hari Subramoni, Dhabaleswar K. Panda:
Accelerating Broadcast Communication with GPU Compression for Deep Learning Workloads. HIPC 2022: 22-31 - [c129]Nawras Alnaasan, Arpan Jain, Aamir Shafi, Hari Subramoni, Dhabaleswar K. Panda:
AccDP: Accelerated Data-Parallel Distributed DNN Training for Modern GPU-Based HPC Clusters. HIPC 2022: 32-41 - [c128]Bharath Ramesh, Qinghua Zhou, Aamir Shafi, Mustafa Abduljabbar, Hari Subramoni, Dhabaleswar K. Panda:
Designing Efficient Pipelined Communication Schemes using Compression in MPI Libraries. HIPC 2022: 95-99 - [c127]Kaushik Kandadi Suresh, Akshay Paniraja Guptha, Benjamin Michalowicz, Bharath Ramesh, Mustafa Abduljabbar, Aamir Shafi, Hari Subramoni, Dhabaleswar K. Panda:
Efficient Personalized and Non-Personalized Alltoall Communication for Modern Multi-HCA GPU-Based Clusters. HIPC 2022: 100-104 - [c126]Kaushik Kandadi Suresh, Kawthar Shafie Khorassani, Chen-Chun Chen, Bharath Ramesh, Mustafa Abduljabbar
, Aamir Shafi, Hari Subramoni, Dhabaleswar K. Panda:
Network Assisted Non-Contiguous Transfers for GPU-Aware MPI Libraries. HOTI 2022: 13-20 - [c125]Tu Tran, Benjamin Michalowicz, Bharath Ramesh, Hari Subramoni, Aamir Shafi, Dhabaleswar K. Panda:
Designing Hierarchical Multi-HCA Aware Allgather in MPI. ICPP Workshops 2022: 28:1-28:10 - [c124]Chen-Chun Chen, Kawthar Shafie Khorassani, Quentin G. Anthony, Aamir Shafi
, Hari Subramoni, Dhabaleswar K. Panda:
Highly Efficient Alltoall and Alltoallv Communication Algorithms for GPU Systems. IPDPS Workshops 2022: 24-33 - [c123]Shulei Xu, Aamir Shafi
, Hari Subramoni, Dhabaleswar K. Panda:
Arm meets Cloud: A Case Study of MPI Library Performance on AWS Arm-based HPC Cloud with Elastic Fabric Adapter. IPDPS Workshops 2022: 449-456 - [c122]Kinan Al-Attar, Aamir Shafi
, Hari Subramoni, Dhabaleswar K. Panda:
Towards Java-based HPC using the MVAPICH2 Library: Early Experiences. IPDPS Workshops 2022: 510-519 - [c121]Nawras Alnaasan, Arpan Jain, Aamir Shafi
, Hari Subramoni, Dhabaleswar K. Panda:
OMB-Py: Python Micro-Benchmarks for Evaluating Performance of MPI Libraries on HPC Systems. IPDPS Workshops 2022: 870-879 - [c120]Qinghua Zhou, Pouya Kousha, Quentin Anthony, Kawthar Shafie Khorassani, Aamir Shafi
, Hari Subramoni
, Dhabaleswar K. Panda:
Accelerating MPI All-to-All Communication with Online Compression on Modern GPU Clusters. ISC 2022: 3-25 - [c119]Pouya Kousha, Arpan Jain, Ayyappa Kolli, Prasanna Sainath, Hari Subramoni
, Aamir Shafi
, Dhabaleswar K. Panda:
"Hey CAI" - Conversational AI Enabled User Interface for HPC Tools. ISC 2022: 87-108 - [c118]Arpan Jain, Aamir Shafi
, Quentin Anthony, Pouya Kousha, Hari Subramoni, Dhabaleswar K. Panda:
Hy-Fi: Hybrid Five-Dimensional Parallel DNN Training on High-Performance GPU Clusters. ISC 2022: 109-130 - [c117]Kawthar Shafie Khorassani, Chen-Chun Chen, Bharath Ramesh, Aamir Shafi, Hari Subramoni, Dhabaleswar K. Panda:
High Performance MPI over the Slingshot Interconnect: Early Experiences. PEARC 2022: 15:1-15:7 - 2021
- [j13]Dhabaleswar Kumar Panda
, Hari Subramoni
, Ching-Hsiang Chu
, Mohammadreza Bayatpour:
The MVAPICH project: Transforming research into high-performance MPI library for HPC community. J. Comput. Sci. 52: 101208 (2021) - [c116]Kawthar Shafie Khorassani, Ching-Hsiang Chu, Quentin G. Anthony, Hari Subramoni, Dhabaleswar K. Panda:
Adaptive and Hierarchical Large Message All-to-all Communication Algorithms for Large-scale Dense GPU Systems. CCGRID 2021: 113-122 - [c115]Aamir Shafi
, Jahanzeb Maqbool Hashmi, Hari Subramoni, Dhabaleswar K. D. K. Panda:
Efficient MPI-based Communication for GPU-Accelerated Dask Applications. CCGRID 2021: 277-286 - [c114]Bharath Ramesh, Jahanzeb Maqbool Hashmi, Shulei Xu, Aamir Shafi
, Seyedeh Mahdieh Ghazimirsaeed, Mohammadreza Bayatpour, Hari Subramoni, Dhabaleswar K. Panda:
Towards Architecture-aware Hierarchical Communication Trees on Modern HPC Systems. HiPC 2021: 272-281 - [c113]Yuntian He, Saket Gurukar, Pouya Kousha, Hari Subramoni, Dhabaleswar K. Panda, Srinivasan Parthasarathy:
DistMILE: A Distributed Multi-Level Framework for Scalable Graph Embedding. HiPC 2021: 282-291 - [c112]Kaushik Kandadi Suresh, Bharath Ramesh, Chen-Chun Chen, Seyedeh Mahdieh Ghazimirsaeed, Mohammadreza Bayatpour, Aamir Shafi
, Hari Subramoni, Dhabaleswar K. Panda:
Layout-aware Hardware-assisted Designs for Derived Data Types in MPI. HiPC 2021: 302-311 - [c111]Nick Sarkauskas, Mohammadreza Bayatpour, Tu Tran, Bharath Ramesh, Hari Subramoni, Dhabaleswar K. Panda:
Large-Message Nonblocking MPI_Iallgather and MPI Ibcast Offload via BlueField-2 DPU. HiPC 2021: 388-393 - [c110]Arpan Jain, Nawras Alnaasan, Aamir Shafi
, Hari Subramoni, Dhabaleswar K. Panda:
Accelerating CPU-based Distributed DNN Training on Modern HPC Clusters using BlueField-2 DPUs. HOTI 2021: 17-24 - [c109]Q. Zhou, C. Chu, N. S. Kumar, Pouya Kousha, Seyedeh Mahdieh Ghazimirsaeed, Hari Subramoni
, Dhabaleswar K. Panda:
Designing High-Performance MPI Libraries with On-the-fly Compression for Modern GPU Clusters*. IPDPS 2021: 444-453 - [c108]Arpan Jain, Tim Moon, Tom Benson, Hari Subramoni, Sam Adé Jacobs, Dhabaleswar K. Panda, Brian Van Essen:
SUPER: SUb-Graph Parallelism for TransformERs. IPDPS 2021: 629-638 - [c107]Quentin Anthony, Lang Xu, Hari Subramoni, Dhabaleswar K. D. K. Panda:
Scaling Single-Image Super-Resolution Training on Modern HPC Clusters: Early Experiences. IPDPS Workshops 2021: 923-932 - [c106]Mohammadreza Bayatpour, Nick Sarkauskas, Hari Subramoni, Jahanzeb Maqbool Hashmi, Dhabaleswar K. Panda:
BluesMPI: Efficient MPI Non-blocking Alltoall Offloading Designs on Modern BlueField Smart NICs. ISC 2021: 18-37 - [c105]Kawthar Shafie Khorassani, Jahanzeb Maqbool Hashmi, Ching-Hsiang Chu
, Chen-Chun Chen, Hari Subramoni, Dhabaleswar K. Panda:
Designing a ROCm-Aware MPI Library for AMD GPUs: Early Experiences. ISC 2021: 118-136 - [c104]Pouya Kousha, Kamal Raj Sankarapandian Dayala Ganesh Ram, Mansa Kedia, Hari Subramoni
, Arpan Jain, Aamir Shafi
, Dhabaleswar K. Panda, Trey Dockendorf, Heechang Na, Karen Tomko:
INAM: Cross-stack Profiling and Analysis of Communication in MPI-based Applications. PEARC 2021: 14:1-14:11 - [i7]Aamir Shafi, Jahanzeb Maqbool Hashmi, Hari Subramoni, Dhabaleswar K. Panda:
Efficient MPI-based Communication for GPU-Accelerated Dask Applications. CoRR abs/2101.08878 (2021) - [i6]Pouya Kousha, Quentin Anthony, Hari Subramoni, Dhabaleswar K. Panda:
Cross-layer Visualization and Profiling of Network and I/O Communication for HPC Clusters. CoRR abs/2109.08329 (2021) - [i5]Nawras Alnaasan, Arpan Jain, Aamir Shafi, Hari Subramoni, Dhabaleswar K. Panda:
OMB-Py: Python Micro-Benchmarks for Evaluating Performance of MPI Libraries on HPC Systems. CoRR abs/2110.10659 (2021) - 2020
- [j12]Sourav Chakraborty, Ignacio Laguna
, Murali Emani, Kathryn Mohror, Dhabaleswar K. Panda, Martin Schulz, Hari Subramoni:
EReinit: Scalable and efficient fault-tolerance for bulk-synchronous MPI applications. Concurr. Comput. Pract. Exp. 32(3) (2020) - [j11]Jahanzeb Maqbool Hashmi, Ching-Hsiang Chu
, Sourav Chakraborty, Mohammadreza Bayatpour, Hari Subramoni, Dhabaleswar K. Panda:
FALCON-X: Zero-copy MPI derived datatype processing on modern CPU and GPU architectures. J. Parallel Distributed Comput. 144: 1-13 (2020) - [j10]Ammar Ahmad Awan, Arpan Jain, Ching-Hsiang Chu
, Hari Subramoni, Dhabaleswar K. Panda:
Communication Profiling and Characterization of Deep-Learning Workloads on Clusters With High-Performance Interconnects. IEEE Micro 40(1): 35-43 (2020) - [c103]Mohammadreza Bayatpour, Seyedeh Mahdieh Ghazimirsaeed, Shulei Xu, Hari Subramoni, Dhabaleswar K. Panda:
Design and Characterization of InfiniBand Hardware Tag Matching in MPI. CCGRID 2020: 101-110 - [c102]Ching-Hsiang Chu
, Kawthar Shafie Khorassani, Qinghua Zhou, Hari Subramoni, Dhabaleswar K. Panda:
Dynamic Kernel Fusion for Bulk Non-contiguous Data Transfer on GPU Clusters. CLUSTER 2020: 130-141 - [c101]Aamir Shafi
, Jahanzeb Maqbool Hashmi, Hari Subramoni, Dhabaleswar K. Panda:
Blink: Towards Efficient RDMA-based Communication Coroutines for Parallel Python Applications. HiPC 2020: 111-120 - [c100]Ching-Hsiang Chu, Pouya Kousha, Ammar Ahmad Awan, Kawthar Shafie Khorassani, Hari Subramoni, Dhabaleswar K. D. K. Panda:
NV-group: link-efficient reduction for distributed deep learning on modern dense GPU systems. ICS 2020: 6:1-6:12 - [c99]Jahanzeb Maqbool Hashmi, Shulei Xu, Bharath Ramesh, Mohammadreza Bayatpour, Hari Subramoni, Dhabaleswar K. D. K. Panda:
Machine-agnostic and Communication-aware Designs for MPI on Emerging Architectures. IPDPS 2020: 32-41 - [c98]Amit Ruhela
, Shulei Xu, Karthik Vadambacheri Manian
, Hari Subramoni, Dhabaleswar K. Panda:
Analyzing and Understanding the Impact of Interconnect Performance on HPC, Big Data, and Deep Learning Applications: A Case Study with InfiniBand EDR and HDR. IPDPS Workshops 2020: 869-878 - [c97]Kaushik Kandadi Suresh, Bharath Ramesh, Seyedeh Mahdieh Ghazimirsaeed, Mohammadreza Bayatpour, Jahanzeb Maqbool Hashmi, Hari Subramoni, Dhabaleswar K. D. K. Panda:
Performance Characterization of Network Mechanisms for Non-Contiguous Data Transfers in MPI. IPDPS Workshops 2020: 896-905 - [c96]Quentin Anthony, Ammar Ahmad Awan, Arpan Jain, Hari Subramoni, Dhabaleswar K. D. K. Panda:
Efficient Training of Semantic Image Segmentation on Summit using Horovod and MVAPICH2-GDR. IPDPS Workshops 2020: 1015-1023 - [c95]Bharath Ramesh, Kaushik Kandadi Suresh, Nick Sarkauskas, Mohammadreza Bayatpour, Jahanzeb Maqbool Hashmi, Hari Subramoni, Dhabaleswar K. Panda:
Scalable MPI Collectives using SHARP: Large Scale Performance Evaluation on the TACC Frontera System. ExaMPI@SC 2020: 11-20 - [c94]Seyedeh Mahdieh Ghazimirsaeed, Quentin Anthony, Aamir Shafi
, Hari Subramoni, Dhabaleswar K. D. K. Panda:
Accelerating GPU-based Machine Learning in Python using MPI Library: A Case Study with MVAPICH2-GDR. MLHPC/AI4S@SC 2020: 17-28 - [c93]Shulei Xu, Seyedeh Mahdieh Ghazimirsaeed, Jahanzeb Maqbool Hashmi, Hari Subramoni, Dhabaleswar K. Panda:
MPI Meets Cloud: Case Study with Amazon EC2 and Microsoft Azure. IPDRM@SC 2020: 41-48 - [c92]Arpan Jain, Ammar Ahmad Awan, Asmaa M. Aljuhani, Jahanzeb Maqbool Hashmi, Quentin G. Anthony, Hari Subramoni, Dhabaleswar K. Panda, Raghu Machiraju, Anil Parwani:
GEMS: GPU-enabled memory-aware model-parallelism system for distributed DNN training. SC 2020: 45 - [c91]Ammar Ahmad Awan, Arpan Jain, Quentin Anthony, Hari Subramoni, Dhabaleswar K. Panda:
HyPar-Flow: Exploiting MPI and Keras for Scalable Hybrid-Parallel DNN Training with TensorFlow. ISC 2020: 83-103 - [c90]Mohammadreza Bayatpour, Jahanzeb Maqbool Hashmi, Sourav Chakraborty, Kaushik Kandadi Suresh, Seyedeh Mahdieh Ghazimirsaeed, Bharath Ramesh, Hari Subramoni, Dhabaleswar K. Panda:
Communication-Aware Hardware-Assisted MPI Overlap Engine. ISC 2020: 517-535 - [c89]Pouya Kousha, Kamal Raj S. D., Hari Subramoni
, Dhabaleswar K. Panda, Heechang Na, Trey Dockendorf, Karen Tomko
:
Accelerated Real-time Network Monitoring and Profiling at Scale using OSU INAM. PEARC 2020: 215-223
2010 – 2019
- 2019
- [j9]Amit Ruhela
, Hari Subramoni, Sourav Chakraborty, Mohammadreza Bayatpour, Pouya Kousha, Dhabaleswar K. Panda:
Efficient design for MPI asynchronous progress without dedicated resources. Parallel Comput. 85: 13-26 (2019) - [j8]Ammar Ahmad Awan
, Karthik Vadambacheri Manian
, Ching-Hsiang Chu
, Hari Subramoni, Dhabaleswar K. Panda:
Optimized large-message broadcast for deep learning workloads: MPI, MPI+NCCL, or NCCL2? Parallel Comput. 85: 141-152 (2019) - [j7]Ching-Hsiang Chu
, Xiaoyi Lu
, Ammar Ahmad Awan
, Hari Subramoni
, Bracy Elton
, Dhabaleswar K. Panda:
Exploiting Hardware Multicast and GPUDirect RDMA for Efficient Broadcast. IEEE Trans. Parallel Distributed Syst. 30(3): 575-588 (2019) - [c88]Karthik Vadambacheri Manian
, A. A. Ammar, Amit Ruhela
, Ching-Hsiang Chu, Hari Subramoni, Dhabaleswar K. Panda:
Characterizing CUDA Unified Memory (UM)-Aware MPI Designs on Modern GPU Architectures. GPGPU@ASPLOS 2019: 43-52 - [c87]Jahanzeb Maqbool Hashmi, Sourav Chakraborty, Mohammadreza Bayatpour, Hari Subramoni, Dhabaleswar K. Panda:
Design and Characterization of Shared Address Space MPI Collectives on Modern Architectures. CCGRID 2019: 410-419 - [c86]Ammar Ahmad Awan, Jeroen Bédorf, Ching-Hsiang Chu, Hari Subramoni, Dhabaleswar K. Panda:
Scalable Distributed DNN Training using TensorFlow and CUDA-Aware MPI: Characterization, Designs, and Performance Evaluation. CCGRID 2019: 498-507 - [c85]Arpan Jain, Ammar Ahmad Awan, Quentin Anthony, Hari Subramoni, Dhabaleswar K. Panda:
Performance Characterization of DNN Training using TensorFlow and PyTorch on Modern Clusters. CLUSTER 2019: 1-11 - [c84]Pouya Kousha, Bharath Ramesh, Kaushik Kandadi Suresh, Ching-Hsiang Chu, Arpan Jain, Nick Sarkauskas, Hari Subramoni
, Dhabaleswar K. Panda:
Designing a Profiling and Visualization Tool for Scalable and In-depth Analysis of High-Performance GPU Clusters. HiPC 2019: 93-102 - [c83]Ching-Hsiang Chu, Jahanzeb Maqbool Hashmi, Kawthar Shafie Khorassani, Hari Subramoni, Dhabaleswar K. Panda:
High-Performance Adaptive MPI Derived Datatype Communication for Modern Multi-GPU Systems. HiPC 2019: 267-276 - [c82]Sourav Chakraborty, Shulei Xu, Hari Subramoni, Dhabaleswar K. Panda:
Designing Scalable and High-Performance MPI Libraries on Amazon Elastic Fabric Adapter. Hot Interconnects 2019: 40-44 - [c81]Ammar Ahmad Awan, Arpan Jain, Ching-Hsiang Chu, Hari Subramoni, Dhabaleswar K. Panda:
Communication Profiling and Characterization of Deep Learning Workloads on Clusters with High-Performance Interconnects. Hot Interconnects 2019: 49-53 - [c80]Jahanzeb Maqbool Hashmi, Sourav Chakraborty, Mohammadreza Bayatpour, Hari Subramoni, Dhabaleswar K. Panda:
FALCON: Efficient Designs for Zero-Copy MPI Datatype Processing on Emerging Architectures. IPDPS 2019: 355-364 - [c79]Dhabaleswar K. Panda, Ammar Ahmad Awan, Hari Subramoni:
High performance distributed deep learning: a beginner's guide. PPoPP 2019: 452-454 - [c78]Amit Ruhela
, Bharath Ramesh, Sourav Chakraborty, Hari Subramoni, Jahanzeb Maqbool Hashmi, Dhabaleswar K. Panda:
Leveraging Network-level parallelism with Multiple Process-Endpoints for MPI Broadcast. IPDRM@SC 2019: 34-41 - [c77]Shulei Xu, Jahanzeb Maqbool Hashmi, Sourav Chakraborty, Hari Subramoni, Dhabaleswar K. Panda:
Design and Evaluation of Shared Memory CommunicationBenchmarks on Emerging Architectures using MVAPICH2. IPDRM@SC 2019: 42-49 - [c76]Arpan Jain, Ammar Ahmad Awan, Hari Subramoni, Dhabaleswar K. Panda:
Scaling TensorFlow, PyTorch, and MXNet using MVAPICH2 for High-Performance Deep Learning on Frontera. DLS@SC 2019: 76-83 - [c75]Karthik Vadambacheri Manian
, Ching-Hsiang Chu, Ammar Ahmad Awan, Kawthar Shafie Khorassani, Hari Subramoni:
OMB-UM: Design, Implementation, and Evaluation of CUDA Unified Memory Aware MPI Benchmarks. PMBS@SC 2019: 82-92 - [c74]Kawthar Shafie Khorassani, Ching-Hsiang Chu
, Hari Subramoni, Dhabaleswar K. Panda:
Performance Evaluation of MPI Libraries on GPU-Enabled OpenPOWER Architectures: Early Experiences. ISC Workshops 2019: 361-378 - [i4]Ammar Ahmad Awan, Arpan Jain, Quentin Anthony, Hari Subramoni, Dhabaleswar K. Panda:
HyPar-Flow: Exploiting MPI and Keras for Scalable Hybrid-Parallel DNN Training using TensorFlow. CoRR abs/1911.05146 (2019) - 2018
- [j6]Dhabaleswar K. Panda, Xiaoyi Lu
, Hari Subramoni:
Networking and communication challenges for post-exascale systems. Frontiers Inf. Technol. Electron. Eng. 19(10): 1230-1235 (2018) - [j5]Srinivasan Ramesh, Aurèle Mahéo, Sameer Shende, Allen D. Malony, Hari Subramoni, Amit Ruhela
, Dhabaleswar K. Panda:
MPI performance engineering with the MPI tool interface: The integration of MVAPICH and TAU. Parallel Comput. 77: 19-37 (2018) - [c73]Mohammadreza Bayatpour, Jahanzeb Maqbool Hashmi, Sourav Chakraborty, Hari Subramoni, Pouya Kousha, Dhabaleswar K. Panda:
SALaR: Scalable and Adaptive Designs for Large Message Reduction Collectives. CLUSTER 2018: 12-23 - [c72]Ammar Ahmad Awan, Ching-Hsiang Chu, Hari Subramoni, Xiaoyi Lu, Dhabaleswar K. Panda:
OC-DNN: Exploiting Advanced Unified Memory Capabilities in CUDA 9 and Volta GPUs for Out-of-Core DNN Training. HiPC 2018: 143-152 - [c71]Jahanzeb Maqbool Hashmi, Sourav Chakraborty, Mohammadreza Bayatpour, Hari Subramoni, Dhabaleswar K. Panda:
Designing Efficient Shared Address Space Reduction Collectives for Multi-/Many-cores. IPDPS 2018: 1020-1029 - [c70]Ammar Ahmad Awan, Ching-Hsiang Chu, Hari Subramoni, Dhabaleswar K. Panda:
Optimized Broadcast for Deep Learning Workloads on Dense-GPU InfiniBand Clusters: MPI or NCCL? EuroMPI 2018: 2:1-2:9 - [c69]Mingzhe Li, Xiaoyi Lu, Hari Subramoni, Dhabaleswar K. Panda:
Multi-Threading and Lock-Free MPI RMA Based Graph Processing on KNL and POWER Architectures. EuroMPI 2018: 4:1-4:10 - [c68]Amit Ruhela
, Hari Subramoni
, Sourav Chakraborty, Mohammadreza Bayatpour, Pouya Kousha, Dhabaleswar K. Panda:
Efficient Asynchronous Communication Progress for MPI without Dedicated Resources. EuroMPI 2018: 14:1-14:11 - [c67]Sourav Chakraborty, Mohammadreza Bayatpour, Jahanzeb Maqbool Hashmi, Hari Subramoni, Dhabaleswar K. Panda:
Cooperative rendezvous protocols for improved performance and overlap. SC 2018: 28:1-28:13 - [i3]Ammar Ahmad Awan, Jeroen Bédorf, Ching-Hsiang Chu, Hari Subramoni, Dhabaleswar K. Panda:
Scalable Distributed DNN Training using TensorFlow and CUDA-Aware MPI: Characterization, Designs, and Performance Evaluation. CoRR abs/1810.11112 (2018) - 2017
- [c66]Sourav Chakraborty, Hari Subramoni, Dhabaleswar K. Panda:
Contention-Aware Kernel-Assisted MPI Collectives for Multi-/Many-Core Systems. CLUSTER 2017: 13-24 - [c65]Hari Subramoni, Xiaoyi Lu, Dhabaleswar K. Panda:
A Scalable Network-Based Performance Analysis Tool for MPI on Large-Scale HPC Systems. CLUSTER 2017: 354-358 - [c64]Mingzhe Li, Xiaoyi Lu, Hari Subramoni, Dhabaleswar K. Panda:
Designing Registration Caching Free High-Performance MPI Library with Implicit On-Demand Paging (ODP) of InfiniBand. HiPC 2017: 62-71 - [c63]Jahanzeb Maqbool Hashmi, Khaled Hamidouche, Hari Subramoni, Dhabaleswar K. Panda:
Kernel-Assisted Communication Engine for MPI on Emerging Manycore Processors. HiPC 2017: 84-93 - [c62]Ching-Hsiang Chu
, Xiaoyi Lu, Ammar Ahmad Awan, Hari Subramoni, Jahanzeb Maqbool Hashmi, Bracy Elton, Dhabaleswar K. Panda:
Efficient and Scalable Multi-Source Streaming Broadcast on GPU Clusters for Deep Learning. ICPP 2017: 161-170 - [c61]Jahanzeb Maqbool Hashmi, Mingzhe Li, Hari Subramoni, Dhabaleswar K. Panda:
Exploiting and Evaluating OpenSHMEM on KNL Architecture. OpenSHMEM 2017: 143-158 - [c60]Srinivasan Ramesh, Aurèle Mahéo, Sameer Shende, Allen D. Malony, Hari Subramoni, Dhabaleswar K. Panda:
MPI performance engineering with the MPI tool interface: the integration of MVAPICH and TAU. EuroMPI/USA 2017: 16:1-16:11 - [c59]Ammar Ahmad Awan, Hari Subramoni, Dhabaleswar K. Panda:
An In-depth Performance Characterization of CPU- and GPU-based DNN Training on Modern Architectures. MLHPC@SC 2017: 8:1-8:8 - [c58]Mohammadreza Bayatpour, Sourav Chakraborty, Hari Subramoni
, Xiaoyi Lu, Dhabaleswar K. Panda:
Scalable reduction collectives with data partitioning-based multi-leader design. SC 2017: 64 - [c57]Hari Subramoni, Sourav Chakraborty, Dhabaleswar K. Panda:
Designing Dynamic and Adaptive MPI Point-to-Point Communication Protocols for Efficient Overlap of Computation and Communication. ISC 2017: 334-354 - [i2]Ammar Ahmad Awan, Ching-Hsiang Chu, Hari Subramoni, Dhabaleswar K. Panda:
Optimized Broadcast for Deep Learning Workloads on Dense-GPU InfiniBand Clusters: MPI or NCCL? CoRR abs/1707.09414 (2017) - 2016
- [j4]Khaled Hamidouche
, Akshay Venkatesh, Ammar Ahmad Awan, Hari Subramoni, Ching-Hsiang Chu
, Dhabaleswar K. Panda:
CUDA-Aware OpenSHMEM: Extensions and Designs for High Performance OpenSHMEM on GPU Clusters. Parallel Comput. 58: 27-36 (2016) - [c56]Sourav Chakraborty, Hari Subramoni, Jonathan L. Perkins, Dhabaleswar K. Panda:
SHMEMPMI - Shared Memory Based PMI for Improved Performance and Scalability. CCGrid 2016: 60-69 - [c55]Xiaoyi Lu, Dipti Shankar, Shashank Gugnani, Hari Subramoni, Dhabaleswar K. Panda:
Impact of HPC Cloud Networking Technologies on Accelerating Hadoop RPC and HBase. CloudCom 2016: 310-317 - [c54]Mohammadreza Bayatpour, Hari Subramoni, Sourav Chakraborty, Dhabaleswar K. Panda:
Adaptive and Dynamic Design for MPI Tag Matching. CLUSTER 2016: 1-10 - [c53]Jiajun Cao, Kapil Arya
, Rohan Garg, L. Shawn Matott, Dhabaleswar K. Panda, Hari Subramoni, Jérôme Vienne, Gene Cooperman:
System-Level Scalable Checkpoint-Restart for Petascale Computing. ICPADS 2016: 932-941 - [c52]Ching-Hsiang Chu, Khaled Hamidouche, Akshay Venkatesh, Dip Sankar Banerjee
, Hari Subramoni, Dhabaleswar K. Panda:
Exploiting Maximal Overlap for Non-Contiguous Data Movement Processing on Modern GPU-Enabled Systems. IPDPS 2016: 983-992 - [c51]Ching-Hsiang Chu
, Khaled Hamidouche, Hari Subramoni, Akshay Venkatesh, Bracy Elton, Dhabaleswar K. Panda:
Designing High Performance Heterogeneous Broadcast for Streaming Applications on GPU Clusters. SBAC-PAD 2016: 59-66 - [c50]Ching-Hsiang Chu, Khaled Hamidouche, Hari Subramoni, Akshay Venkatesh, Bracy Elton, Dhabaleswar K. Panda:
Efficient Reliability Support for Hardware Multicast-Based Broadcast in GPU-enabled Streaming Applications. COMHPC@SC 2016: 29-38 - [c49]Mingzhe Li, Khaled Hamidouche, Xiaoyi Lu, Hari Subramoni, Jie Zhang, Dhabaleswar K. Panda:
Designing MPI library with on-demand paging (ODP) of infiniband: challenges and benefits. SC 2016: 433-443 - [c48]Hari Subramoni, Albert Mathews Augustine, Mark Daniel Arnold, Jonathan L. Perkins, Xiaoyi Lu, Khaled Hamidouche, Dhabaleswar K. Panda:
INAM2: InfiniBand Network Analysis and Monitoring with MPI. ISC 2016: 300-320 - [i1]Jiajun Cao, Kapil Arya, Rohan Garg, L. Shawn Matott, Dhabaleswar K. Panda, Hari Subramoni, Jérôme Vienne, Gene Cooperman:
System-level Scalable Checkpoint-Restart for Petascale Computing. CoRR abs/1607.07995 (2016) - 2015
- [c47]