


default search action
Wen-Mei W. Hwu
Wen-mei W. Hwu – Wen-Mei Hwu
Person information
- affiliation: University of Illinois at Urbana-Champaign, Department of Electrical and Computer Engineering, Urbana-Champaign, IL, USA
- award (1999): Grace Murray Hopper Award
Refine list

refinements active!
zoomed in on ?? of ?? records
view refined list in
export refined list as
showing all ?? records
2020 – today
- 2024
- [j85]Mohit Mahajan, Wen-Mei Hwu, Rakesh Nagi
:
Determining optimal channel partition for 2:4 fine grained structured sparsity. Optim. Lett. 18(9): 2079-2090 (2024) - [j84]Jeongmin Brian Park, Vikram Sharma Mailthody, Zaid Qureshi, Wen-Mei Hwu:
Accelerating Sampling and Aggregation Operations in GNN Frameworks with GPU Initiated Direct Storage Accesses. Proc. VLDB Endow. 17(6): 1227-1240 (2024) - [c247]Chia-Hao Chang
, Jihoon Han
, Anand Sivasubramaniam
, Vikram Sharma Mailthody
, Zaid Qureshi
, Wen-Mei Hwu
:
GMT: GPU Orchestrated Memory Tiering for the Big Data Era. ASPLOS (3) 2024: 464-478 - [c246]Kun Wu
, Mert Hidayetoglu
, Xiang Song
, Sitao Huang
, Da Zheng
, Israt Nisa
, Wen-Mei Hwu
:
Hector: An Efficient Programming and Compilation Framework for Implementing Relational Graph Neural Networks in GPU Architectures. ASPLOS (3) 2024: 528-544 - [c245]Mert Hidayetoglu
, Simon Garcia De Gonzalo
, Elliott Slaughter
, Yu Li
, Christopher Zimmer
, Tekin Bicer
, Bin Ren
, William Gropp
, Wen-Mei Hwu
, Alex Aiken
:
CommBench: Micro-Benchmarking Hierarchical Networks with Multi-GPU, Multi-NIC Nodes. ICS 2024: 426-436 - [c244]Ali Hassani, Wen-Mei Hwu, Humphrey Shi:
Faster Neighborhood Attention: Reducing the O(n^2) Cost of Self Attention at the Threadblock Level. NeurIPS 2024 - [c243]Kaiwen Cao, Archit Gajjar, Liad Gerstman, Kun Wu, Sai Rahul Chalamalasetti, Aditya Dhakal, Giacomo Pedretti, Pavana Prakash, Wen-Mei Hwu, Deming Chen, Dejan S. Milojicic:
Acceleration of Graph Neural Networks with Heterogenous Accelerators Architecture. SC Workshops 2024: 1081-1089 - [i62]Ali Hassani, Wen-Mei Hwu, Humphrey Shi:
Faster Neighborhood Attention: Reducing the O(n^2) Cost of Self Attention at the Threadblock Level. CoRR abs/2403.04690 (2024) - [i61]Jeongmin Brian Park, Kun Wu
, Vikram Sharma Mailthody, Zaid Qureshi, Scott A. Mahlke, Wen-Mei W. Hwu:
LSM-GNN: Large-scale Storage-based Multi-GPU GNN Training by Optimizing Data Transfer Scheme. CoRR abs/2407.15264 (2024) - [i60]Mert Hidayetoglu, Simon Garcia de Gonzalo, Elliott Slaughter, Pinku Surana, Wen-Mei W. Hwu, William Gropp, Alex Aiken:
HiCCL: A Hierarchical Collective Communication Library. CoRR abs/2408.05962 (2024) - [i59]Kun Wu
, Jeongmin Brian Park, Xiaofan Zhang, Mert Hidayetoglu, Vikram Sharma Mailthody, Sitao Huang, Steven S. Lumetta, Wen-Mei W. Hwu:
TBA: Faster Large Language Model Training Using SSD-Based Activation Offloading. CoRR abs/2408.10013 (2024) - 2023
- [j83]Mohamed El-Hadedy
, Xinfei Guo
, Kazutomo Yoshii, Yichen Cai, Robert Herndon, Bryan Banta, Wen-Mei Hwu:
RECO-ASCON: Reconfigurable ASCON hash functions for IoT applications. Integr. 93: 102061 (2023) - [c242]Mohammad Almasri, Yen-Hsiang Chang, Izzat El Hajj, Rakesh Nagi, Jinjun Xiong, Wen-mei W. Hwu:
Parallelizing Maximal Clique Enumeration on GPUs. PACT 2023: 162-175 - [c241]Jie Huang, Kevin Chen-Chuan Chang, Jinjun Xiong, Wen-Mei Hwu:
Can Language Models Be Specific? How? ACL (Findings) 2023: 716-727 - [c240]Zaid Qureshi, Vikram Sharma Mailthody, Isaac Gelado, Seungwon Min, Amna Masood, Jeongmin Brian Park, Jinjun Xiong, Chris J. Newburn, Dmitri Vainbrand, I-Hsin Chung, Michael Garland, William J. Dally, Wen-Mei W. Hwu:
GPU-Initiated On-Demand High-Throughput Storage Access in the BaM System Architecture. ASPLOS (2) 2023: 325-339 - [c239]Luyang Yu, Yizhen Lu, Meghna Mandava, Edward Richter, Vikram Sharma Mailthody, Seungwon Min, Wen-Mei W. Hwu, Deming Chen:
FSSD: FPGA-Based Emulator for SSDs. FPL 2023: 101-108 - [c238]Samiran Kawtikwar
, Mohammad Almasri
, Wen-Mei Hwu
, Rakesh Nagi
, Jinjun Xiong
:
BEEP: Balanced Efficient subgraph Enumeration in Parallel. ICPP 2023: 142-152 - [c237]Mohamed El-Hadedy
, Russell Hua, Kazutomo Yoshii, Wen-Mei Hwu, Martin Margala:
RECO-LFSR: Reconfigurable Low-power Cryptographic processor based on LFSR for Trusted IoT platforms. ISQED 2023: 1-7 - [c236]Arpandeep Khatua
, Vikram Sharma Mailthody
, Bhagyashree Taleka
, Tengfei Ma
, Xiang Song
, Wen-Mei Hwu
:
IGB: Addressing The Gaps In Labeling, Features, Heterogeneity, and Size of Public Graph Datasets for Deep Learning Research. KDD 2023: 4284-4295 - [c235]Mohamed El-Hadedy
, Russell Hua, Shahzman Saqib, Kazutomo Yoshii, Wen-Mei Hwu, Martin Margala:
BLTESTI: Benchmarking Lightweight TinyJAMBU on Embedded Systems for Trusted IoT. SOCC 2023: 1-6 - [c234]Benjamin Reidys
, Yuqi Xue
, Daixuan Li
, Bharat Sukhwani
, Wen-Mei Hwu
, Deming Chen
, Sameh W. Asaad
, Jian Huang
:
RackBlox: A Software-Defined Rack-Scale Storage System with Network-Storage Co-Design. SOSP 2023: 182-199 - [i58]Kun Wu, Mert Hidayetoglu, Xiang Song, Sitao Huang, Da Zheng, Israt Nisa, Wen-Mei W. Hwu:
PIGEON: Optimizing CUDA Code Generator for End-to-End Training and Inference of Relational Graph Neural Networks. CoRR abs/2301.06284 (2023) - [i57]Arpandeep Khatua, Vikram Sharma Mailthody, Bhagyashree Taleka, Tengfei Ma, Xiang Song, Wen-mei W. Hwu:
IGB: Addressing The Gaps In Labeling, Features, Heterogeneity, and Size of Public Graph Datasets for Deep Learning Research. CoRR abs/2302.13522 (2023) - [i56]Jeongmin Brian Park, Vikram Sharma Mailthody, Zaid Qureshi, Wen-Mei Hwu:
Accelerating Sampling and Aggregation Operations in GNN Frameworks with GPU Initiated Direct Storage Accesses. CoRR abs/2306.16384 (2023) - [i55]Jeongmin Brian Park, Zaid Qureshi, Vikram S. Mailthody, Andrew Gacek, Shunfan Shao, Mohammad Almasri, Isaac Gelado, Jinjun Xiong, Chris J. Newburn, I-Hsin Chung, Michael Garland, Nikolay Sakharnykh, Wen-Mei W. Hwu:
CODAG: Characterizing and Optimizing Decompression Algorithms for GPUs. CoRR abs/2307.03760 (2023) - [i54]Benjamin Reidys, Yuqi Xue, Daixuan Li, Bharat Sukhwani, Wen-mei W. Hwu, Deming Chen, Sameh W. Asaad, Jian Huang:
RackBlox: A Software-Defined Rack-Scale Storage System with Network-Storage Co-Design. CoRR abs/2309.06513 (2023) - 2022
- [j82]Omer Anjum, Mohammad Almasri, Simon Garcia de Gonzalo, Wen-Mei W. Hwu:
An efficient GPU implementation and scaling for higher-order 3D stencils. Inf. Sci. 586: 326-343 (2022) - [j81]Xiaofan Zhang
, Yuan Ma
, Jinjun Xiong
, Wen-Mei W. Hwu
, Volodymyr V. Kindratenko
, Deming Chen
:
Exploring HW/SW Co-Design for Video Analysis on CPU-FPGA Heterogeneous Systems. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 41(6): 1606-1619 (2022) - [j80]Mert Hidayetoglu
, Tekin Biçer
, Simon Garcia de Gonzalo, Bin Ren, Doga Gürsoy
, Rajkumar Kettimuthu, Ian T. Foster
, Wen-Mei W. Hwu
:
MemXCT: Design, Optimization, Scaling, and Reproducibility of X-Ray Tomography Imaging. IEEE Trans. Parallel Distributed Syst. 33(9): 2014-2031 (2022) - [c233]Jie Huang, Kevin Chang, Jinjun Xiong, Wen-Mei Hwu:
Open Relation Modeling: Learning to Define Relations between Entities. ACL (Findings) 2022: 297-308 - [c232]Mhd Ghaith Olabi, Juan Gómez-Luna, Onur Mutlu, Wen-Mei Hwu, Izzat El Hajj:
A Compiler Framework for Optimizing Dynamic Parallelism on GPUs. CGO 2022: 1-13 - [c231]Jie Huang, Hanyin Shao, Kevin Chen-Chuan Chang, Jinjun Xiong, Wen-Mei Hwu:
Understanding Jargon: Combining Extraction and Generation for Definition Modeling. EMNLP 2022: 3994-4004 - [c230]Jie Huang, Kerui Zhu, Kevin Chen-Chuan Chang, Jinjun Xiong, Wen-Mei Hwu:
DEER: Descriptive Knowledge Graph for Explaining Entity Relationships. EMNLP 2022: 6686-6698 - [c229]Mohammad Almasri, Izzat El Hajj, Rakesh Nagi
, Jinjun Xiong
, Wen-Mei Hwu:
Parallel K-clique counting on GPUs. ICS 2022: 21:1-21:14 - [c228]Vibhor Dodeja, Mohammad Almasri, Rakesh Nagi
, Jinjun Xiong
, Wen-Mei Hwu:
PARSEC: PARallel Subgraph Enumeration in CUDA. IPDPS 2022: 168-178 - [c227]Seungwon Min, Kun Wu
, Mert Hidayetoglu, Jinjun Xiong
, Xiang Song, Wen-Mei Hwu:
Graph Neural Network Training and Data Tiering. KDD 2022: 3555-3565 - [c226]Xiangdong Wei, Mohamed El-Hadedy
, Sergiu Mosanu
, Zhengping Zhu, Wen-Mei Hwu, Xinfei Guo:
RECO-HCON: A High-Throughput Reconfigurable Compact ASCON Processor for Trusted IoT. SOCC 2022: 1-6 - [d1]Zaid Qureshi
, Vikram Sharma Mailthody
, Isaac Gelago
, Seungwon Min
, Amna Masood
, Jeongmin Brian Park
, Jinjun Xiong
, Chris J. Newburn
, Dmitri Vainbrand
, I-Hsin Chung
, Michael Garland
, William J. Dally
, Wen-mei W. Hwu
:
GPU-Initiated On-Demand High-Throughput Storage Access in the BaM System Architecture. Zenodo, 2022 - [i53]Mhd Ghaith Olabi, Juan Gómez-Luna, Onur Mutlu, Wen-Mei W. Hwu, Izzat El Hajj:
A Compiler Framework for Optimizing Dynamic Parallelism on GPUs. CoRR abs/2201.02789 (2022) - [i52]Zaid Qureshi, Vikram Sharma Mailthody, Isaac Gelado, Seungwon Min, Amna Masood, Jeongmin Brian Park, Jinjun Xiong, Chris J. Newburn, Dmitri Vainbrand, I-Hsin Chung, Michael Garland, William J. Dally, Wen-Mei W. Hwu:
BaM: A Case for Enabling Fine-grain High Throughput GPU-Orchestrated Access to Storage. CoRR abs/2203.04910 (2022) - [i51]Jie Huang, Kerui Zhu, Kevin Chen-Chuan Chang, Jinjun Xiong
, Wen-Mei Hwu:
DKG: A Descriptive Knowledge Graph for Explaining Relationships between Entities. CoRR abs/2205.10479 (2022) - [i50]Jie Huang, Kevin Chen-Chuan Chang, Jinjun Xiong
, Wen-Mei Hwu:
Can Language Models Be Specific? How? CoRR abs/2210.05159 (2022) - [i49]Omer Anjum, Alok Kamatar, Toby Liang, Jinjun Xiong, Wen-Mei Hwu:
Submission-Aware Reviewer Profiling for Reviewer Recommender System. CoRR abs/2211.04194 (2022) - [i48]Mohammad Almasri, Yen-Hsiang Chang
, Izzat El Hajj, Rakesh Nagi
, Jinjun Xiong
, Wen-mei W. Hwu:
Parallelizing Maximal Clique Enumeration on GPUs. CoRR abs/2212.01473 (2022) - 2021
- [j79]Seungwon Min, Kun Wu
, Sitao Huang, Mert Hidayetoglu
, Jinjun Xiong
, Eiman Ebrahimi, Deming Chen, Wen-mei W. Hwu:
Large Graph Convolutional Network Training with GPU-Oriented Data Communication Architecture. Proc. VLDB Endow. 14(11): 2087-2100 (2021) - [j78]Sitao Huang
, Kun Wu
, Hyunmin Jeong, Chengyue Wang, Deming Chen
, Wen-Mei Hwu
:
PyLog: An Algorithm-Centric Python-Based FPGA Programming and Synthesis Flow. IEEE Trans. Computers 70(12): 2015-2028 (2021) - [j77]Qin Li, Xiaofan Zhang
, Jinjun Xiong
, Wen-Mei Hwu, Deming Chen
:
Efficient Methods for Mapping Neural Machine Translator on FPGAs. IEEE Trans. Parallel Distributed Syst. 32(7): 1866-1877 (2021) - [c225]Sultan Durrani, Muhammad Saad Chughtai, Mert Hidayetoglu, Rashid Tahir, Abdul Dakkak, Lawrence Rauchwerger, Fareed Zaffar, Wen-Mei W. Hwu:
Accelerating Fourier and Number Theoretic Transforms using Tensor Cores and Warp Shuffles. PACT 2021: 345-355 - [c224]Jie Huang
, Kevin Chang, Jinjun Xiong, Wen-Mei Hwu:
Measuring Fine-Grained Domain Relevance of Terms: A Hierarchical Core-Fringe Approach. ACL/IJCNLP (1) 2021: 3641-3651 - [c223]Ashutosh Dhar, Paul Reckamp, Jinjun Xiong
, Wen-Mei Hwu, Deming Chen:
Graviton: A Reconfigurable Memory-Compute Fabric for Data Intensive Applications. ARC 2021: 254-264 - [c222]Sitao Huang, Aayush Ankit, Plínio Silveira, Rodrigo Antunes, Sai Rahul Chalamalasetti, Izzat El Hajj, Dong Eun Kim
, Glaucimar Aguiar, Pedro Bruel, Sergey Serebryakov
, Cong Xu, Can Li, Paolo Faraboschi, John Paul Strachan, Deming Chen, Kaushik Roy, Wen-Mei W. Hwu, Dejan S. Milojicic:
Mixed Precision Quantization for ReRAM-based DNN Inference Accelerators. ASP-DAC 2021: 372-377 - [c221]Jiachen Li, Bowen Cheng, Rogério Feris, Jinjun Xiong
, Thomas S. Huang, Wen-Mei Hwu, Humphrey Shi
:
Pseudo-IoU: Improving Label Assignment in Anchor-Free Object Detection. CVPR Workshops 2021: 2378-2387 - [c220]Chengyue Wang, Sitao Huang, Wen-Mei Hwu, Deming Chen:
Extending HLS with High-Level Descriptive Language for Configurable Algorithm-Level Spatial Structure Design. FCCM 2021: 261 - [c219]Sitao Huang, Kun Wu, Hyunmin Jeong, Chengyue Wang, Deming Chen, Wen-Mei Hwu:
PyLog: An Algorithm-Centric Python-Based FPGA Programming and Synthesis Flow. FPGA 2021: 227-228 - [c218]Carl Pearson, Kun Wu
, I-Hsin Chung, Jinjun Xiong
, Wen-Mei Hwu:
TEMPI: An Interposed MPI Library with a Canonical Representation of CUDA-aware Datatypes. HPDC 2021: 95-106 - [c217]Mohammad Almasri, Neo Vasudeva, Rakesh Nagi
, Jinjun Xiong
, Wen-Mei Hwu:
HyKernel: A Hybrid Selection of One/Two-Phase Kernels for Triangle Counting on GPUs. HPEC 2021: 1-7 - [c216]Zhonghao Wang, Kai Wang, Mo Yu, Jinjun Xiong
, Wen-Mei Hwu, Mark Hasegawa-Johnson, Humphrey Shi
:
Interpretable Visual Reasoning via Induced Symbolic Space. ICCV 2021: 1858-1867 - [c215]Sultan Durrani, Muhammad Saad Chughtai, Abdul Dakkak, Wen-Mei Hwu, Lawrence Rauchwerger:
FFT blitz: the tensor cores strike back. PPoPP 2021: 488-489 - [c214]Omer Anjum, Mohammad Almasri, Jinjun Xiong, Wen-Mei W. Hwu:
PhraseScope: An Effective and Unsupervised Framework for Mining High Quality Phrases. SDM 2021: 639-647 - [i47]Vikram Sharma Mailthody, James Wei, Nicholas Chen, Mohammad Behnia, Ruihao Yao, Qihao Wang, Vedant Agrawal, Churan He, Lijian Wang, Leihao Chen, Amit Agarwal, Edward Richter, Wen-Mei Hwu, Christopher W. Fletcher, Jinjun Xiong, Andrew Miller, Sanjay Patel:
Safer Illinois and RokWall: Privacy Preserving University Health Apps for COVID-19. CoRR abs/2101.07897 (2021) - [i46]Seungwon Min, Kun Wu, Sitao Huang, Mert Hidayetoglu, Jinjun Xiong, Eiman Ebrahimi, Deming Chen, Wen-Mei W. Hwu:
PyTorch-Direct: Enabling GPU Centric Data Access for Very Large Graph Neural Network Training with Irregular Accesses. CoRR abs/2101.07956 (2021) - [i45]Seungwon Min, Kun Wu, Sitao Huang, Mert Hidayetoglu, Jinjun Xiong, Eiman Ebrahimi, Deming Chen, Wen-Mei W. Hwu:
Large Graph Convolutional Network Training with GPU-Oriented Data Communication Architecture. CoRR abs/2103.03330 (2021) - [i44]Mohammad Almasri, Izzat El Hajj, Rakesh Nagi, Jinjun Xiong, Wen-Mei W. Hwu:
K-Clique Counting on GPUs. CoRR abs/2104.13209 (2021) - [i43]Jiachen Li, Bowen Cheng, Rogério Feris, Jinjun Xiong, Thomas S. Huang, Wen-Mei Hwu, Humphrey Shi:
Pseudo-IoU: Improving Label Assignment in Anchor-Free Object Detection. CoRR abs/2104.14082 (2021) - [i42]Jie Huang, Kevin Chen-Chuan Chang, Jinjun Xiong, Wen-mei W. Hwu:
Measuring Fine-Grained Domain Relevance of Terms: A Hierarchical Core-Fringe Approach. CoRR abs/2105.13255 (2021) - [i41]Jie Huang
, Kevin Chen-Chuan Chang, Jinjun Xiong, Wen-Mei W. Hwu:
Open Relation Modeling: Learning to Define Relations between Entities. CoRR abs/2108.09241 (2021) - [i40]Yen-Hsiang Chang, Jianhao Pu, Wen-Mei W. Hwu, Jinjun Xiong:
MLHarness: A Scalable Benchmarking System for MLCommons. CoRR abs/2111.05231 (2021) - [i39]Seungwon Min, Kun Wu, Mert Hidayetoglu, Jinjun Xiong, Xiang Song, Wen-mei W. Hwu:
Graph Neural Network Training with Data Tiering. CoRR abs/2111.05894 (2021) - 2020
- [j76]Seungwon Min, Vikram Sharma Mailthody
, Zaid Qureshi, Jinjun Xiong
, Eiman Ebrahimi, Wen-Mei Hwu:
EMOGI: Efficient Memory-access for Out-of-memory Graph-traversal In GPUs. Proc. VLDB Endow. 14(2): 114-127 (2020) - [j75]Aayush Ankit
, Izzat El Hajj, Sai Rahul Chalamalasetti, Sapan Agarwal
, Matthew J. Marinella
, Martin Foltin, John Paul Strachan, Dejan S. Milojicic
, Wen-Mei Hwu
, Kaushik Roy
:
PANTHER: A Programmable Architecture for Neural Network Training Harnessing Energy-Efficient ReRAM. IEEE Trans. Computers 69(8): 1128-1142 (2020) - [c213]Cheng Li, Abdul Dakkak, Jinjun Xiong
, Wen-Mei W. Hwu:
The Design and Implementation of a Scalable Deep Learning Benchmarking Platform. CLOUD 2020: 414-425 - [c212]Abdul Dakkak, Tom Wickham-Jones, Wen-Mei Hwu:
The design and implementation of the wolfram language compiler. CGO 2020: 212-228 - [c211]Omer Anjum, Chak Ho Chan, Tanitpong Lawphongpanich, Yucheng Liang, Tianyi Tang, Shuchen Zhang, Wen-Mei Hwu, Jinjun Xiong
, Sanjay Patel:
Vertext: An End-to-end AI Powered Conversation Management System for Multi-party Chat Platforms. CSCW Companion 2020: 1-6 - [c210]Zhonghao Wang, Yunchao Wei, Rogério Schmidt Feris, Jinjun Xiong
, Wen-Mei W. Hwu, Thomas S. Huang, Honghui Shi:
Alleviating Semantic-level Shift: A Semi-supervised Domain Adaptation Method for Semantic Segmentation. CVPR Workshops 2020: 4043-4047 - [c209]Zhonghao Wang, Mo Yu, Yunchao Wei, Rogério Feris, Jinjun Xiong
, Wen-Mei Hwu, Thomas S. Huang, Honghui Shi:
Differential Treatment for Stuff and Things: A Simple Unsupervised Domain Adaptation Method for Semantic Segmentation. CVPR 2020: 12632-12641 - [c208]Yuhong Li, Cong Hao, Xiaofan Zhang, Xinheng Liu, Yao Chen
, Jinjun Xiong
, Wen-mei W. Hwu, Deming Chen:
EDD: Efficient Differentiable DNN Architecture and Implementation Co-search for Embedded AI Solutions. DAC 2020: 1-6 - [c207]Jie Huang
, Zilong Wang, Kevin Chang, Wen-Mei Hwu, Jinjun Xiong:
Exploring Semantic Capacity of Terms. EMNLP (1) 2020: 8509-8518 - [c206]Cong Hao, Yao Chen
, Xiaofan Zhang, Yuhong Li, Jinjun Xiong
, Wen-Mei Hwu, Deming Chen:
Effective Algorithm-Accelerator Co-design for AI Solutions on Edge Devices. ACM Great Lakes Symposium on VLSI 2020: 283-290 - [c205]Mert Hidayetoglu
, Carl Pearson, Vikram Sharma Mailthody
, Eiman Ebrahimi, Jinjun Xiong
, Rakesh Nagi
, Wen-Mei Hwu:
At-Scale Sparse Deep Neural Network Inference With Efficient GPU Implementation. HPEC 2020: 1-7 - [c204]Xiaofan Zhang, Hanchen Ye
, Junsong Wang, Yonghua Lin, Jinjun Xiong
, Wen-Mei Hwu, Deming Chen:
DNNExplorer: A Framework for Modeling and Exploring a Novel Paradigm of FPGA-based DNN Accelerator. ICCAD 2020: 61:1-61:9 - [c203]Mohamed El-Hadedy, Martin Margala, Sergiu Mosanu
, Danilo Gligoroski, Jinjun Xiong
, Wen-Mei Hwu:
Micro - GAGE: A Low-power Compact GAGE Hash Function Processor for IoT Applications. ICECS 2020: 1-4 - [c202]Cheng Li, Abdul Dakkak, Jinjun Xiong
, Wei Wei, Lingjie Xu, Wen-Mei Hwu:
XSP: Across-Stack Profiling and Analysis of Machine Learning Models on GPUs. IPDPS 2020: 326-327 - [c201]Cheng Li, Abdul Dakkak, Jinjun Xiong
, Wen-Mei Hwu:
Benanza: Automatic μBenchmark Generation to Compute "Lower-bound" Latency and Inform Optimizations of Deep Learning Models on GPUs. IPDPS 2020: 440-450 - [c200]Carl Pearson, Mert Hidayetoglu
, Mohammad Almasri, Omer Anjum, I-Hsin Chung, Jinjun Xiong
, Wen-Mei W. Hwu:
Node-Aware Stencil Communication for Heterogeneous Supercomputers. IPDPS Workshops 2020: 796-805 - [c199]Wen-Mei Hwu:
Advancing Computing Infrastructure for Very Large-Scale Deep Learning at C3SR. IPDPS Workshops 2020: 989 - [c198]Ashutosh Dhar, Xiaohao Wang, Hubertus Franke, Jinjun Xiong
, Jian Huang, Wen-Mei W. Hwu, Nam Sung Kim, Deming Chen:
FReaC Cache: Folded-logic Reconfigurable Computing in the Last Level Cache. MICRO 2020: 102-117 - [c197]Xiaofan Zhang, Haoming Lu, Cong Hao, Jiachen Li, Bowen Cheng, Yuhong Li, Kyle Rupnow, Jinjun Xiong, Thomas S. Huang, Honghui Shi, Wen-Mei Hwu, Deming Chen:
SkyNet: a Hardware-Efficient Method for Object Detection and Tracking on Embedded Systems. MLSys 2020 - [c196]Abdul Dakkak, Cheng Li, Jinjun Xiong, Wen-mei W. Hwu:
DLSpec: A Deep Learning Task Exchange Specification. OpML 2020 - [c195]Mert Hidayetoglu
, Tekin Bicer, Simon Garcia De Gonzalo, Bin Ren, Vincent De Andrade, Doga Gürsoy, Raj Kettimuthu, Ian T. Foster, Wen-mei W. Hwu:
Petascale XCT: 3D image reconstruction with hierarchical communications on multi-GPU nodes. SC 2020: 37 - [c194]Cheng Li, Abdul Dakkak, Jinjun Xiong
, Wen-Mei Hwu:
DLBricks: Composable Benchmark Generation to Reduce Deep Learning Benchmarking Effort on CPUs. ICPE 2020: 202-209 - [e5]Eduard Ayguadé, Wen-mei W. Hwu, Rosa M. Badia, H. Peter Hofstee:
ICS '20: 2020 International Conference on Supercomputing, Barcelona Spain, June, 2020. ACM 2020, ISBN 978-1-4503-7983-0 [contents] - [i38]Abdul Dakkak, Cheng Li, Jinjun Xiong, Wen-Mei Hwu:
MLModelScope: A Distributed Platform for Model Evaluation and Benchmarking at Scale. CoRR abs/2002.08295 (2020) - [i37]Abdul Dakkak, Cheng Li, Jinjun Xiong, Wen-Mei Hwu:
DLSpec: A Deep Learning Task Exchange Specification. CoRR abs/2002.11262 (2020) - [i36]Zhonghao Wang, Mo Yu, Yunchao Wei, Rogério Schmidt Feris, Jinjun Xiong, Wen-Mei Hwu, Thomas S. Huang, Honghui Shi:
Differential Treatment for Stuff and Things: A Simple Unsupervised Domain Adaptation Method for Semantic Segmentation. CoRR abs/2003.08040 (2020) - [i35]Zhonghao Wang, Yunchao Wei, Rogério Feris, Jinjun Xiong, Wen-Mei Hwu, Thomas S. Huang, Honghui Shi:
Alleviating Semantic-level Shift: A Semi-supervised Domain Adaptation Method for Semantic Segmentation. CoRR abs/2004.00794 (2020) - [i34]Yuhong Li, Cong Hao, Xiaofan Zhang, Xinheng Liu, Yao Chen, Jinjun Xiong, Wen-Mei W. Hwu, Deming Chen:
EDD: Efficient Differentiable DNN Architecture and Implementation Co-search for Embedded AI Solutions. CoRR abs/2005.02563 (2020) - [i33]Seungwon Min, Vikram Sharma Mailthody, Zaid Qureshi, Jinjun Xiong, Eiman Ebrahimi, Wen-Mei W. Hwu:
EMOGI: Efficient Memory-access for Out-of-memory Graph-traversal In GPUs. CoRR abs/2006.06890 (2020) - [i32]Mert Hidayetoglu, Carl Pearson, Vikram Sharma Mailthody, Eiman Ebrahimi, Jinjun Xiong, Rakesh Nagi, Wen-mei W. Hwu:
Efficient Inference on GPUs for the Sparse Deep Neural Network Graph Challenge 2020. CoRR abs/2007.14152 (2020) - [i31]Zaid Qureshi, Vikram Sharma Mailthody, Seungwon Min, I-Hsin Chung, Jinjun Xiong, Wen-Mei W. Hwu:
Tearing Down the Memory Wall. CoRR abs/2008.10169 (2020) - [i30]Xiaofan Zhang, Hanchen Ye, Junsong Wang, Yonghua Lin, Jinjun Xiong, Wen-Mei W. Hwu, Deming Chen:
DNNExplorer: A Framework for Modeling and Exploring a Novel Paradigm of FPGA-based DNN Accelerator. CoRR abs/2008.12745 (2020) - [i29]Mert Hidayetoglu, Tekin Bicer
, Simon Garcia De Gonzalo, Bin Ren, Vincent De Andrade, Doga Gürsoy, Raj Kettimuthu, Ian T. Foster, Wen-mei W. Hwu:
Petascale XCT: 3D Image Reconstruction with Hierarchical Communications on Multi-GPU Nodes. CoRR abs/2009.07226 (2020) - [i28]Jie Huang, Zilong Wang, Kevin Chen-Chuan Chang, Wen-Mei Hwu, Jinjun Xiong:
Exploring Semantic Capacity of Terms. CoRR abs/2010.01898 (2020) - [i27]Cong Hao, Yao Chen, Xiaofan Zhang, Yuhong Li, Jinjun Xiong, Wen-Mei Hwu, Deming Chen:
Effective Algorithm-Accelerator Co-design for AI Solutions on Edge Devices. CoRR abs/2010.07185 (2020) - [i26]Zhonghao Wang, Mo Yu, Kai Wang, Jinjun Xiong, Wen-Mei Hwu, Mark Hasegawa-Johnson, Humphrey Shi:
Interpretable Visual Reasoning via Induced Symbolic Space. CoRR abs/2011.11603 (2020) - [i25]Carl Pearson, Kun Wu, I-Hsin Chung, Jinjun Xiong, Wen-Mei Hwu:
Fast CUDA-Aware MPI Datatypes without Platform Support. CoRR abs/2012.14363 (2020)
2010 – 2019
- 2019
- [c193]Abdul Dakkak, Cheng Li
, Simon Garcia De Gonzalo, Jinjun Xiong
, Wen-Mei Hwu:
TrIMS: Transparent and Isolated Model Sharing for Low Latency Deep Learning Inference in Function-as-a-Service. CLOUD 2019: 372-382 - [c192]Qin Li, Xiaofan Zhang, Jinjun Xiong
, Wen-Mei Hwu, Deming Chen:
Implementing neural machine translation with bi-directional GRU and attention mechanism on FPGAs using HLS. ASP-DAC 2019: 693-698 - [c191]Aayush Ankit, Izzat El Hajj, Sai Rahul Chalamalasetti, Geoffrey Ndu, Martin Foltin, R. Stanley Williams
, Paolo Faraboschi, Wen-mei W. Hwu, John Paul Strachan, Kaushik Roy, Dejan S. Milojicic
:
PUMA: A Programmable Ultra-efficient Memristor-based Accelerator for Machine Learning Inference. ASPLOS 2019: 715-731 - [c190]Ahmed H. M. O. Abulila, Vikram Sharma Mailthody
, Zaid Qureshi, Jian Huang, Nam Sung Kim, Jinjun Xiong
, Wen-Mei W. Hwu:
FlatFlash: Exploiting the Byte-Accessibility of SSDs within a Unified Memory-Storage Hierarchy. ASPLOS 2019: 971-985 - [c189]Simon Garcia De Gonzalo, Sitao Huang, Juan Gómez-Luna, Simon D. Hammond, Onur Mutlu, Wen-Mei Hwu:
Automatic Generation of Warp-Level Primitives and Atomic Instructions for Fast and Portable Parallel Reduction on GPUs. CGO 2019: 73-84 - [c188]