


default search action
27th HPCA 2021: Seoul, South Korea
- IEEE International Symposium on High-Performance Computer Architecture, HPCA 2021, Seoul, South Korea, February 27 - March 3, 2021. IEEE 2021, ISBN 978-1-6654-2235-2
Security Architectures
- Seonjin Na, Sunho Lee, Yeonjae Kim, Jongse Park, Jaehyuk Huh:
Common Counters: Compressed Encryption Counters for Secure GPU Memory. 1-13 - Dingyuan Cao
, Mingzhe Zhang, Hang Lu, Xiaochun Ye, Dongrui Fan
, Yuezhi Che, Rujia Wang:
Streamline Ring ORAM Accesses through Spatial and Temporal Optimization. 14-25 - Brandon Reagen, Wooseok Choi, Yeongil Ko, Vincent T. Lee, Hsien-Hsin S. Lee
, Gu-Yeon Wei, David Brooks:
Cheetah: Optimizing and Accelerating Homomorphic Encryption for Private Inference. 26-39 - Zecheng He, Guangyuan Hu, Ruby B. Lee:
New Models for Understanding and Reasoning about Speculative Execution Attacks. 40-53
Accelerators for Machine Learning 1
- Sean Kinzer, Joon Kyung Kim, Soroush Ghodrati, Brahmendra Reddy Yatham, Alric Althoff, Divya Mahajan
, Sorin Lerner, Hadi Esmaeilzadeh:
A Computational Stack for Cross-Domain Acceleration. 54-70 - Hyoukjun Kwon
, Liangzhen Lai, Michael Pellauer, Tushar Krishna, Yu-Hsin Chen, Vikas Chandra:
Heterogeneous Dataflow Accelerators for Multi-DNN Workloads. 71-83 - Reza Hojabr, Ali Sedaghati, Amirali Sharifian, Ahmad Khonsari, Arrvindh Shriraman:
SPAGHETTI: Streaming Accelerators for Highly Sparse GEMM on FPGAs. 84-96 - Hanrui Wang, Zhekai Zhang, Song Han:
SpAtten: Efficient Sparse Attention Architecture with Cascade Token and Head Pruning. 97-110
Storage Systems
- Mohammad A. Alshboul
, Prakash Ramrakhyani, William Wang, James Tuck
, Yan Solihin:
BBB: Simplifying Persistent Programming using Battery-Backed Buffers. 111-124 - Per Ekemark, Yuan Yao, Alberto Ros
, Konstantinos Sagonas
, Stefanos Kaxiras:
TSOPER: Efficient Coherence-Based Strict Persistency. 125-138 - Mazen Al-Wadi, Vamsee Reddy Kommareddy, Clayton Hughes
, Simon David Hammond, Amro Awad:
Stealth-Persist: Architectural Support for Persistent Applications in Hybrid Memory Systems. 139-152
Quantum Computing
- Xin-Chuan Wu, Dripto M. Debroy, Yongshan Ding
, Jonathan M. Baker, Yuri Alexeev, Kenneth R. Brown, Frederic T. Chong
:
TILT: Achieving Higher Fidelity on a Trapped-Ion Linear-Tape Quantum Computing Architecture. 153-166 - Lei Liu, Xinglei Dou:
QuCloud: A New Qubit Mapping Mechanism for Multi-programming Quantum Computing in Cloud Environment. 167-178 - Ji Liu
, Huiyang Zhou
:
Systematic Approaches for Precise and Approximate Quantum State Runtime Assertion. 179-193 - Aneeqa Fatima
, Igor L. Markov:
Faster Schrödinger-style simulation of quantum circuits. 194-207
Systems for Machine Learning 1
- Sung-En Chang, Yanyu Li, Mengshu Sun
, Runbin Shi, Hayden K. H. So
, Xuehai Qian, Yanzhi Wang, Xue Lin:
Mix and Match: A Novel FPGA-Centric Deep Neural Network Quantization Framework. 208-220 - Mohsen Imani, Zhuowen Zou, Samuel Bosch, Sanjay Anantha Rao, Sahand Salamat, Venkatesh Kumar, Yeseong Kim, Tajana Rosing:
Revisiting HyperDimensional Learning for FPGA and Low-Power Architectures. 221-234 - Youngeun Kwon
, Yunjae Lee, Minsoo Rhu:
Tensor Casting: Co-Designing Algorithm-Architecture for Personalized Recommendation Training. 235-248 - Heesu Kim, Hanmin Park, Taehyun Kim, Kwanheum Cho, Eojin Lee, Soojung Ryu, Hyuk-Jae Lee, Kiyoung Choi, Jinho Lee
:
GradPIM: A Practical Processing-in-DRAM Architecture for Gradient Descent. 249-262
Cache Design
- Christina Giannoula, Nandita Vijaykumar, Nikela Papadopoulou
, Vasileios Karakostas, Ivan Fernandez, Juan Gómez-Luna, Lois Orosa, Nectarios Koziris, Georgios I. Goumas, Onur Mutlu:
SynCron: Efficient Synchronization Support for Near-Data-Processing Architectures. 263-276 - Mainak Chaudhuri:
Zero Directory Eviction Victim: Unbounded Coherence Directory and Core Cache Isolation. 277-290 - Subhash Sethumurugan, Jieming Yin, John Sartori:
Designing a Cost-Effective Cache Replacement Policy using Machine Learning. 291-303 - Antonio Franques, Apostolos Kokolis
, Sergi Abadal, Vimuth Fernando, Sasa Misailovic, Josep Torrellas:
WiDir: A Wireless-Enabled Directory Cache Coherence Protocol. 304-317
Security Attacks
- Zhihui Shao, Mohammad A. Islam
, Shaolei Ren:
Heat Behind the Meter: A Hidden Threat of Thermal Attacks in Edge Colocation Data Centers. 318-331 - Jaeguk Ahn
, Cheolgyu Jin, Jiho Kim, Minsoo Rhu, Yunsi Fei
, David R. Kaeli, John Kim:
Trident: A Hybrid Correlation-Collision GPU Cache Timing Attack for AES Key Recovery. 332-344 - Abdullah Giray Yaglikçi
, Minesh Patel, Jeremie S. Kim, Roknoddin Azizi, Ataberk Olgun, Lois Orosa
, Hasan Hassan, Jisung Park
, Konstantinos Kanellopoulos, Taha Shahroodi, Saugata Ghose, Onur Mutlu:
BlockHammer: Preventing RowHammer at Low Cost by Blacklisting Rapidly-Accessed DRAM Rows. 345-358 - Jianming Huang, Yu Hua:
A Write-Friendly and Fast-Recovery Scheme for Security Metadata in Non-Volatile Memories. 359-370
Hardware Accelerators Beyond Machine Learning
- Yu Zhang, Xiaofei Liao, Hai Jin, Ligang He, Bingsheng He, Haikun Liu, Lin Gu:
DepGraph: A Dependency-Driven Accelerator for Efficient Iterative Graph Processing. 371-384 - Yifan Yuan, Yipeng Wang, Ren Wang, Rangeen Basu Roy Chowdhury, Charlie Tai, Nam Sung Kim:
QEI: Query Acceleration Can be Generic and Efficient in the Cloud. 385-398 - Lei Jiang, Farzaneh Zokaee:
EXMA: A Genomics Accelerator for Exact-Matching. 399-411 - Christopher Torng, Peitian Pan, Yanghui Ou, Cheng Tan
, Christopher Batten:
Ultra-Elastic CGRAs for Irregular Loop Specialization. 412-425
Memory and Storage Architectures
- Chun-Yi Liu
, Yunju Lee, Wonil Choi, Myoungsoo Jung, Mahmut Taylan Kandemir, Chita R. Das:
GSSA: A Resource Allocation Scheme Customized for 3D NAND SSDs. 426-439 - Ananth Krishna Prasad, Morteza Rezaalipour, Masoud Dehyadegari, Mahdi Nazm Bojnordi:
Memristive Data Ranking. 440-452 - Vamsee Reddy Kommareddy, Clayton Hughes
, Simon David Hammond, Amro Awad:
DeACT: Architecture-Aware Virtual Memory Support for Fabric Attached Memory Systems. 453-466
High Throughput Architectures
- Mohamed Assem Ibrahim
, Onur Kayiran, Yasuko Eckert, Gabriel H. Loh, Adwait Jog:
Analyzing and Leveraging Decoupled L1 Caches in GPUs. 467-478 - Tsung Tai Yeh, Matthew D. Sinclair, Bradford M. Beckmann, Timothy G. Rogers
:
Deadline-Aware Offloading for High-Throughput Accelerators. 479-492 - Yujeong Choi, Yunseong Kim, Minsoo Rhu:
Lazy Batching: An SLA-aware Batching System for Cloud Machine Learning Inference. 493-506 - Chandrashis Mazumdar, Prachatos Mitra
, Arkaprava Basu:
Dead Page and Dead Block Predictors: Cleaning TLBs and Caches Together. 507-519
Power Efficiency and Resiliency
- Sam Ainsworth, Lionel Zoubritzky, Alan Mycroft, Timothy M. Jones:
ParaDox: Eliminating Voltage Margins via Heterogeneous Fault Tolerance. 520-532 - Jian Chen, Xiaowei Jiang, Ying Zhang, Liyin Liu, Huifeng Xu, Qiang Liu:
CARE: Coordinated Augmentation for Elastic Resilience on DRAM Errors in Data Centers. 533-544 - Erick Carvajal Barboza, Sara Jacob, Mahesh Ketkar, Michael Kishinevsky, Paul Gratz
, Jiang Hu:
Automatic Microprocessor Performance Bug Detection. 545-556 - Helena Caminal
, Kailin Yang
, Srivatsa Srinivasa, Akshay Krishna Ramanathan, Khalid Al-Hawaj, Tianshu Wu, Vijaykrishnan Narayanan
, Christopher Batten, José F. Martínez
:
CAPE: A Content-Addressable Processing Engine. 557-569
Systems for Machine Learning 2
- Xinfeng Xie, Zheng Liang
, Peng Gu, Abanti Basak, Lei Deng
, Ling Liang, Xing Hu, Yuan Xie:
SpaceA: Sparse Matrix Vector Multiplication on Processing-in-Memory Accelerator. 570-583 - Young H. Oh, Seonghak Kim, Yunho Jin, Sam Son, Jonghyun Bae, Jongsung Lee
, Yeonhong Park, Dong Uk Kim, Tae Jun Ham, Jae W. Lee:
Layerweaver: Maximizing Resource Utilization of Neural Processing Units via Layer-Wise Scheduling. 584-597 - Jie Ren
, Jiaolin Luo, Kai Wu, Minjia Zhang, Hyeran Jeon, Dong Li:
Sentinel: Efficient Tensor Migration and Allocation on Heterogeneous Memory Systems for Deep Learning. 598-611 - Jiajun Li, Ahmed Louri, Avinash Karanth, Razvan C. Bunescu:
CSCNN: Algorithm-hardware Co-design for CNN Accelerators using Centrosymmetric Filters. 612-625
Best Paper Nominees
- B. Pratheek, Neha Jawalkar, Arkaprava Basu:
Improving GPU Multi-tenancy with Page Walk Stealing. 626-639 - Zhengrong Wang, Jian Weng, Jason Lowe-Power
, Jayesh Gaur, Tony Nowatzki:
Stream Floating: Enabling Proactive and Decentralized Cache Optimizations. 640-653 - Nishil Talati, Kyle May, Armand Behroozi, Yichen Yang
, Kuba Kaszyk, Christos Vasiladiotis
, Tarunesh Verma, Lu Li, Brandon Nguyen, Jiawen Sun, John Magnus Morton, Agreen Ahmadi, Todd M. Austin, Michael F. P. O'Boyle, Scott A. Mahlke, Trevor N. Mudge, Ronald G. Dreslinski:
Prodigy: Improving the Memory Latency of Data-Indirect Irregular Workloads Using Hardware-Software Co-Design. 654-667 - Vignesh Balaji
, Neal Clayton Crago, Aamer Jaleel, Brandon Lucia:
P-OPT: Practical Optimal Cache Replacement for Graph Analytics. 668-681
Network on Chip
- Hossein Farrokhbakht, Henry Kao, Kamran Hasan, Paul V. Gratz
, Tushar Krishna, Joshua San Miguel
, Natalie D. Enright Jerger
:
Pitstop: Enabling a Virtual Network Free Network-on-Chip. 682-695 - Gyuyoung Kwauk, Seungkwan Kang, Hans Kasan, Hyojun Son, John Kim:
BoomGate: Deadlock Avoidance in Non-Minimal Routing for High-Radix Networks. 696-708 - Xiaowei Ren, Mieszko Lis:
CHOPIN: Scalable Graphics Rendering in Multi-GPU Systems via Parallel Image Composition. 709-722 - Hao Zheng
, Ke Wang, Ahmed Louri:
Adapt-NoC: A Flexible Network-on-Chip Design for Heterogeneous Manycore Architectures. 723-735
Emerging Technologies and Applications
- Chencheng Ye, Yuanchao Xu
, Xipeng Shen
, Xiaofei Liao, Hai Jin, Yan Solihin:
Hardware-Based Address-Centric Acceleration of Key-Value Store. 736-748 - Richard Afoakwa, Yiqiao Zhang, Uday Kumar Reddy Vengalam, Zeljko Ignjatovic, Michael C. Huang
:
BRIM: Bistable Resistively-Coupled Ising Machine. 749-760 - Ben Feinberg, Ryan Wong, T. Patrick Xiao, Christopher H. Bennett, Jacob N. Rohan, Erik G. Boman, Matthew J. Marinella, Sapan Agarwal
, Engin Ipek:
An Analog Preconditioner for Solving Linear Systems. 761-774 - Jiajun Li, Ahmed Louri, Avinash Karanth, Razvan C. Bunescu:
GCNAX: A Flexible and Energy-efficient Accelerator for Graph Convolutional Neural Networks. 775-788
Industry Track 1
- Heng Liao, Jiajin Tu, Jing Xia, Hu Liu, Xiping Zhou, Honghui Yuan, Yuxing Hu:
Ascend: a Scalable and Unified Architecture for Ubiquitous Deep Neural Network Computing : Industry Track Paper. 789-801 - Bilge Acun, Matthew Murphy, Xiaodong Wang, Jade Nie, Carole-Jean Wu, Kim M. Hazelwood:
Understanding Training Efficiency of Deep Learning Recommendation Models at Scale. 802-814 - Ying Zhang, Jian Chen, Xiaowei Jiang, Qiang Liu, Ian M. Steiner, Andrew J. Herdrich, Kevin Shu, Ripan Das, Long Cui, Litrin Jiang:
LIBRA: Clearing the Cloud Through Dynamic Memory Bandwidth Management. 815-826 - Yiming Gan, Bo Yu, Boyuan Tian, Leimeng Xu, Wei Hu, Shaoshan Liu, Qiang Liu, Yanjun Zhang, Jie Tang, Yuhao Zhu:
Eudoxus: Characterizing and Accelerating Localization in Autonomous Machines Industry Track Paper. 827-840
Industry Track 2
- Tianqi Tang, Sheng Li, Lifeng Nai, Norman P. Jouppi, Yuan Xie:
NeuroMeter: An Integrated Power, Area, and Timing Modeling Framework for Machine Learning Accelerators Industry Track Paper. 841-853 - Udit Gupta, Young Geun Kim, Sylvia Lee, Jordan Tse, Hsien-Hsin S. Lee, Gu-Yeon Wei, David Brooks, Carole-Jean Wu:
Chasing Carbon: The Elusive Environmental Footprint of Computing. 854-867 - Oreste Villa, Daniel Lustig, Zi Yan
, Evgeny Bolotin, Yaosheng Fu, Niladrish Chatterjee, Nan Jiang, David W. Nellans:
Need for Speed: Experiences Building a Trustworthy System-Level GPU Simulator. 868-880 - Rohan Basu Roy, Tirthak Patel
, Raj Kettimuthu, William E. Allcock, Paul Rich, Adam Scovel, Devesh Tiwari:
Operating Liquid-Cooled Large-Scale Systems: Long-Term Monitoring, Reliability Analysis, and Efficiency Measures. 881-893
Best of CAL
Accelerators for Machine Learning 2
- Jianxun Yang
, Zhao Zhang, Zhuangzhi Liu, Jing Zhou, Leibo Liu, Shaojun Wei, Shouyi Yin:
FuseKNA: Fused Kernel Convolution based Accelerator for Deep Neural Networks. 894-907 - Bahar Asgari
, Ramyad Hadidi, Jiashen Cao, Da Eun Shim, Sung Kyu Lim
, Hyesoon Kim:
FAFNIR: Accelerating Sparse Gathering by Using Efficient Near-Memory Intelligent Reduction. 908-920 - Julian Pavon, Iván Vargas Valdivieso
, Adrián Barredo, Joan Marimon, Miquel Moretó
, Francesc Moll
, Osman S. Unsal, Mateo Valero, Adrián Cristal:
VIA: A Smart Scratchpad for Vector Units with Application to Sparse Matrix Computations. 921-934

manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.