


Остановите войну!
for scientists:


default search action
Ammar Ahmad Awan
Person information

- affiliation: The Ohio State University, Columbus, OH, USA
Refine list

refinements active!
zoomed in on ?? of ?? records
view refined list in
export refined list as
2020 – today
- 2023
- [i13]Siddharth Singh, Olatunji Ruwase, Ammar Ahmad Awan, Samyam Rajbhandari, Yuxiong He, Abhinav Bhatele:
A Novel Tensor-Expert Hybrid Parallelism Approach to Scale Mixture-of-Experts Training. CoRR abs/2303.06318 (2023) - [i12]Quentin Anthony, Ammar Ahmad Awan, Jeff Rasley, Yuxiong He, Aamir Shafi, Mustafa Abduljabbar, Hari Subramoni, Dhabaleswar K. Panda:
MCR-DL: Mix-and-Match Communication Runtime for Deep Learning. CoRR abs/2303.08374 (2023) - 2022
- [c32]Conglong Li, Ammar Ahmad Awan, Hanlin Tang, Samyam Rajbhandari, Yuxiong He:
1-bit LAMB: Communication Efficient Large-Scale Large-Batch Training with LAMB's Convergence Speed. HIPC 2022: 272-281 - [c31]Samyam Rajbhandari, Conglong Li, Zhewei Yao, Minjia Zhang, Reza Yazdani Aminabadi, Ammar Ahmad Awan, Jeff Rasley, Yuxiong He:
DeepSpeed-MoE: Advancing Mixture-of-Experts Inference and Training to Power Next-Generation AI Scale. ICML 2022: 18332-18346 - [c30]Reza Yazdani Aminabadi, Samyam Rajbhandari, Ammar Ahmad Awan, Cheng Li, Du Li, Elton Zheng, Olatunji Ruwase, Shaden Smith, Minjia Zhang, Jeff Rasley, Yuxiong He:
DeepSpeed- Inference: Enabling Efficient Inference of Transformer Models at Unprecedented Scale. SC 2022: 46:1-46:15 - [i11]Samyam Rajbhandari, Conglong Li, Zhewei Yao, Minjia Zhang, Reza Yazdani Aminabadi, Ammar Ahmad Awan, Jeff Rasley, Yuxiong He:
DeepSpeed-MoE: Advancing Mixture-of-Experts Inference and Training to Power Next-Generation AI Scale. CoRR abs/2201.05596 (2022) - [i10]Reza Yazdani Aminabadi, Samyam Rajbhandari, Minjia Zhang, Ammar Ahmad Awan, Cheng Li, Du Li, Elton Zheng, Jeff Rasley, Shaden Smith, Olatunji Ruwase, Yuxiong He:
DeepSpeed Inference: Enabling Efficient Inference of Transformer Models at Unprecedented Scale. CoRR abs/2207.00032 (2022) - 2021
- [c29]Hanlin Tang, Shaoduo Gan, Ammar Ahmad Awan, Samyam Rajbhandari, Conglong Li, Xiangru Lian, Ji Liu, Ce Zhang, Yuxiong He:
1-bit Adam: Communication Efficient Large-Scale Training with Adam's Convergence Speed. ICML 2021: 10118-10129 - [i9]Hanlin Tang, Shaoduo Gan, Ammar Ahmad Awan, Samyam Rajbhandari, Conglong Li, Xiangru Lian, Ji Liu, Ce Zhang, Yuxiong He:
1-bit Adam: Communication Efficient Large-Scale Training with Adam's Convergence Speed. CoRR abs/2102.02888 (2021) - [i8]Conglong Li, Ammar Ahmad Awan, Hanlin Tang, Samyam Rajbhandari, Yuxiong He:
1-bit LAMB: Communication Efficient Large-Scale Large-Batch Training with LAMB's Convergence Speed. CoRR abs/2104.06069 (2021) - [i7]Young Jin Kim, Ammar Ahmad Awan, Alexandre Muzio, Andrés Felipe Cruz-Salinas, Liyang Lu, Amr Hendy, Samyam Rajbhandari, Yuxiong He, Hany Hassan Awadalla:
Scalable and Efficient MoE Training for Multitask Multilingual Models. CoRR abs/2109.10465 (2021) - 2020
- [j5]Ammar Ahmad Awan, Arpan Jain, Ching-Hsiang Chu
, Hari Subramoni, Dhabaleswar K. Panda:
Communication Profiling and Characterization of Deep-Learning Workloads on Clusters With High-Performance Interconnects. IEEE Micro 40(1): 35-43 (2020) - [c28]Ching-Hsiang Chu, Pouya Kousha, Ammar Ahmad Awan, Kawthar Shafie Khorassani, Hari Subramoni, Dhabaleswar K. D. K. Panda:
NV-group: link-efficient reduction for distributed deep learning on modern dense GPU systems. ICS 2020: 6:1-6:12 - [c27]Quentin Anthony, Ammar Ahmad Awan, Arpan Jain, Hari Subramoni, Dhabaleswar K. D. K. Panda:
Efficient Training of Semantic Image Segmentation on Summit using Horovod and MVAPICH2-GDR. IPDPS Workshops 2020: 1015-1023 - [c26]Arpan Jain, Ammar Ahmad Awan, Asmaa M. Aljuhani, Jahanzeb Maqbool Hashmi, Quentin G. Anthony, Hari Subramoni, Dhabaleswar K. Panda, Raghu Machiraju, Anil Parwani:
GEMS: GPU-enabled memory-aware model-parallelism system for distributed DNN training. SC 2020: 45 - [c25]Ammar Ahmad Awan, Arpan Jain, Quentin Anthony, Hari Subramoni, Dhabaleswar K. Panda:
HyPar-Flow: Exploiting MPI and Keras for Scalable Hybrid-Parallel DNN Training with TensorFlow. ISC 2020: 83-103
2010 – 2019
- 2019
- [j4]Ammar Ahmad Awan
, Karthik Vadambacheri Manian
, Ching-Hsiang Chu
, Hari Subramoni, Dhabaleswar K. Panda:
Optimized large-message broadcast for deep learning workloads: MPI, MPI+NCCL, or NCCL2? Parallel Comput. 85: 141-152 (2019) - [j3]Ching-Hsiang Chu
, Xiaoyi Lu
, Ammar Ahmad Awan
, Hari Subramoni
, Bracy Elton
, Dhabaleswar K. Panda:
Exploiting Hardware Multicast and GPUDirect RDMA for Efficient Broadcast. IEEE Trans. Parallel Distributed Syst. 30(3): 575-588 (2019) - [c24]Ammar Ahmad Awan, Jeroen Bédorf, Ching-Hsiang Chu, Hari Subramoni, Dhabaleswar K. Panda:
Scalable Distributed DNN Training using TensorFlow and CUDA-Aware MPI: Characterization, Designs, and Performance Evaluation. CCGRID 2019: 498-507 - [c23]Arpan Jain, Ammar Ahmad Awan, Quentin Anthony, Hari Subramoni, Dhabaleswar K. Panda:
Performance Characterization of DNN Training using TensorFlow and PyTorch on Modern Clusters. CLUSTER 2019: 1-11 - [c22]Ammar Ahmad Awan, Arpan Jain, Ching-Hsiang Chu, Hari Subramoni, Dhabaleswar K. Panda:
Communication Profiling and Characterization of Deep Learning Workloads on Clusters with High-Performance Interconnects. Hot Interconnects 2019: 49-53 - [c21]Dhabaleswar K. Panda, Ammar Ahmad Awan, Hari Subramoni:
High performance distributed deep learning: a beginner's guide. PPoPP 2019: 452-454 - [c20]Arpan Jain, Ammar Ahmad Awan, Hari Subramoni, Dhabaleswar K. Panda:
Scaling TensorFlow, PyTorch, and MXNet using MVAPICH2 for High-Performance Deep Learning on Frontera. DLS@SC 2019: 76-83 - [c19]Karthik Vadambacheri Manian
, Ching-Hsiang Chu, Ammar Ahmad Awan, Kawthar Shafie Khorassani, Hari Subramoni:
OMB-UM: Design, Implementation, and Evaluation of CUDA Unified Memory Aware MPI Benchmarks. PMBS@SC 2019: 82-92 - [i6]Ammar Ahmad Awan, Arpan Jain, Quentin Anthony, Hari Subramoni, Dhabaleswar K. Panda:
HyPar-Flow: Exploiting MPI and Keras for Scalable Hybrid-Parallel DNN Training using TensorFlow. CoRR abs/1911.05146 (2019) - 2018
- [c18]Ammar Ahmad Awan, Ching-Hsiang Chu, Hari Subramoni, Xiaoyi Lu, Dhabaleswar K. Panda:
OC-DNN: Exploiting Advanced Unified Memory Capabilities in CUDA 9 and Volta GPUs for Out-of-Core DNN Training. HiPC 2018: 143-152 - [c17]Ammar Ahmad Awan, Ching-Hsiang Chu, Hari Subramoni, Dhabaleswar K. Panda:
Optimized Broadcast for Deep Learning Workloads on Dense-GPU InfiniBand Clusters: MPI or NCCL? EuroMPI 2018: 2:1-2:9 - [i5]Ammar Ahmad Awan, Jeroen Bédorf, Ching-Hsiang Chu, Hari Subramoni, Dhabaleswar K. Panda:
Scalable Distributed DNN Training using TensorFlow and CUDA-Aware MPI: Characterization, Designs, and Performance Evaluation. CoRR abs/1810.11112 (2018) - 2017
- [c16]Ching-Hsiang Chu
, Xiaoyi Lu, Ammar Ahmad Awan, Hari Subramoni, Jahanzeb Maqbool Hashmi, Bracy Elton, Dhabaleswar K. Panda:
Efficient and Scalable Multi-Source Streaming Broadcast on GPU Clusters for Deep Learning. ICPP 2017: 161-170 - [c15]Ammar Ahmad Awan, Khaled Hamidouche, Jahanzeb Maqbool Hashmi, Dhabaleswar K. Panda:
S-Caffe: Co-designing MPI Runtimes and Caffe for Scalable Deep Learning on Modern GPU Clusters. PPoPP 2017: 193-205 - [c14]Ammar Ahmad Awan, Hari Subramoni, Dhabaleswar K. Panda:
An In-depth Performance Characterization of CPU- and GPU-based DNN Training on Modern Architectures. MLHPC@SC 2017: 8:1-8:8 - [i4]Ammar Ahmad Awan, Ching-Hsiang Chu, Hari Subramoni, Dhabaleswar K. Panda:
Optimized Broadcast for Deep Learning Workloads on Dense-GPU InfiniBand Clusters: MPI or NCCL? CoRR abs/1707.09414 (2017) - 2016
- [j2]Khaled Hamidouche
, Akshay Venkatesh, Ammar Ahmad Awan, Hari Subramoni, Ching-Hsiang Chu
, Dhabaleswar K. Panda:
CUDA-Aware OpenSHMEM: Extensions and Designs for High Performance OpenSHMEM on GPU Clusters. Parallel Comput. 58: 27-36 (2016) - [c13]Ching-Hsiang Chu, Khaled Hamidouche, Akshay Venkatesh, Ammar Ahmad Awan, Dhabaleswar K. Panda:
CUDA Kernel Based Collective Reduction Operations on Large-scale GPU Clusters. CCGrid 2016: 726-735 - [c12]Khaled Hamidouche, Ammar Ahmad Awan, Akshay Venkatesh, Dhabaleswar K. Panda:
CUDA M3: Designing Efficient CUDA Managed Memory-Aware MPI by Exploiting GDR and IPC. HiPC 2016: 52-61 - [c11]A. A. Awan, Khaled Hamidouche, Akshay Venkatesh, Dhabaleswar K. Panda:
Efficient Large Message Broadcast using NCCL and CUDA-Aware MPI for Deep Learning. EuroMPI 2016: 15-22 - 2015
- [c10]Khaled Hamidouche, Akshay Venkatesh, Ammar Ahmad Awan, Hari Subramoni, Ching-Hsiang Chu
, Dhabaleswar K. Panda:
Exploiting GPUDirect RDMA in Designing High Performance OpenSHMEM for NVIDIA GPU Clusters. CLUSTER 2015: 78-87 - [c9]Sourav Chakraborty, Hari Subramoni, Jonathan L. Perkins, Ammar Ahmad Awan, Dhabaleswar K. Panda:
On-demand Connection Management for OpenSHMEM and OpenSHMEM+MPI. IPDPS Workshops 2015: 235-244 - [c8]A. A. Awan, Khaled Hamidouche, Ching-Hsiang Chu
, Dhabaleswar K. Panda:
A Case for Non-blocking Collectives in OpenSHMEM: Design, Implementation, and Performance Evaluation using MVAPICH2-X. OpenSHMEM 2015: 69-86 - [c7]A. A. Awan, Khaled Hamidouche, Akshay Venkatesh, Jonathan L. Perkins, Hari Subramoni, Dhabaleswar K. Panda:
GPU-Aware Design, Implementation, and Evaluation of Non-blocking Collective Benchmarks. EuroMPI 2015: 9:1-9:10 - [c6]Hari Subramoni, Ammar Ahmad Awan, Khaled Hamidouche, Dmitry Pekurovsky
, Akshay Venkatesh, Sourav Chakraborty, Karen Tomko
, Dhabaleswar K. Panda:
Designing Non-blocking Personalized Collectives with Near Perfect Overlap for RDMA-Enabled Clusters. ISC 2015: 434-453 - 2013
- [j1]Zeeshan Pervez
, Ammar Ahmad Awan, Asad Masood Khattak, Sungyoung Lee, Eui-Nam Huh:
Privacy-aware searching with oblivious term matching for cloud storage. J. Supercomput. 63(2): 538-560 (2013) - [c5]N. Amjad, Nadeem Javaid
, Arsalan Haider, A. A. Awan, M. Rahman:
DREEM-ME: Distributed Regional Energy Efficient Multi-hop Routing Protocol Based on Maximum Energy in WSNs. BWCCA 2013: 43-48 - [c4]Arsalan Haider, Nadeem Javaid
, N. Amjad, A. A. Awan, Abid Khan, Nasir Khan:
REECH-ME: Regional Energy Efficient Cluster Heads Based on Maximum Energy Routing Protocol for WSNs. BWCCA 2013: 88-92 - [c3]Ammar Ahmad Awan, Muhammad Bilal Amin
, Shujaat Hussain, Aamir Shafi
, Sungyoung Lee:
An MPI-IO Compliant Java Based Parallel I/O Library. CCGRID 2013: 174-175 - [i3]Arsalan Haider, Nadeem Javaid, N. Amjad, A. A. Awan, Abid Khan, Nasir Khan:
REECH-ME: Regional Energy Efficient Cluster Heads based on Maximum Energy Routing Protocol for WSNs. CoRR abs/1307.7052 (2013) - [i2]N. Amjad, Nadeem Javaid, Arsalan Haider, A. A. Awan, M. Rahman:
DREEM-ME: Distributed Regional Energy Efficient Multi-hop Routing Protocol based on Maximum Energy in WSNs. CoRR abs/1307.7075 (2013) - [i1]Obaid Ur Rehman, Nadeem Javaid, Arsalan Haider, N. Amjad, A. A. Awan, M. Qamar, Zahoor Ali Khan, U. Qasim:
An Energy Efficient Decoding Scheme for Wireless Body Area Sensor Networks. CoRR abs/1309.4374 (2013) - 2012
- [c2]Muhammad Bilal Amin
, Wajahat Ali Khan, Ammar Ahmad Awan, Sungyoung Lee:
Intercloud message exchange middleware. ICUIMC 2012: 79:1-79:7 - [c1]Ammar Ahmad Awan, Muhammad Sohaib Ayub
, Aamir Shafi
, Sungyoung Lee:
Towards Efficient Support for Parallel I/O in Java HPC. PDCAT 2012: 137-143
Coauthor Index

manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.
Unpaywalled article links
Add open access links from to the list of external document links (if available).
Privacy notice: By enabling the option above, your browser will contact the API of unpaywall.org to load hyperlinks to open access articles. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Unpaywall privacy policy.
Archived links via Wayback Machine
For web page which are no longer available, try to retrieve content from the of the Internet Archive (if available).
load content from web.archive.org
Privacy notice: By enabling the option above, your browser will contact the API of web.archive.org to check for archived content of web pages that are no longer available. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Internet Archive privacy policy.
Reference lists
Add a list of references from ,
, and
to record detail pages.
load references from crossref.org and opencitations.net
Privacy notice: By enabling the option above, your browser will contact the APIs of crossref.org, opencitations.net, and semanticscholar.org to load article reference information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Crossref privacy policy and the OpenCitations privacy policy, as well as the AI2 Privacy Policy covering Semantic Scholar.
Citation data
Add a list of citing articles from and
to record detail pages.
load citations from opencitations.net
Privacy notice: By enabling the option above, your browser will contact the API of opencitations.net and semanticscholar.org to load citation information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the OpenCitations privacy policy as well as the AI2 Privacy Policy covering Semantic Scholar.
OpenAlex data
Load additional information about publications from .
Privacy notice: By enabling the option above, your browser will contact the API of openalex.org to load additional information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the information given by OpenAlex.
last updated on 2023-05-25 22:26 CEST by the dblp team
all metadata released as open data under CC0 1.0 license
see also: Terms of Use | Privacy Policy | Imprint