


default search action
Shuaiwen Song
Person information
- affiliation: University of Sydney, Australia
Refine list

refinements active!
zoomed in on ?? of ?? records
view refined list in
export refined list as
2020 – today
- 2025
 [j24]Fengxiang Bie [j24]Fengxiang Bie , Yibo Yang , Yibo Yang , Zhongzhu Zhou, Adam Ghanem , Zhongzhu Zhou, Adam Ghanem , Minjia Zhang, Zhewei Yao, Xiaoxia Wu, Connor Holmes, Pareesa Ameneh Golnari, David A. Clifton , Minjia Zhang, Zhewei Yao, Xiaoxia Wu, Connor Holmes, Pareesa Ameneh Golnari, David A. Clifton , Yuxiong He, Dacheng Tao , Yuxiong He, Dacheng Tao , Shuaiwen Leon Song: , Shuaiwen Leon Song:
 RenAIssance: A Survey Into AI Text-to-Image Generation in the Era of Large Model. IEEE Trans. Pattern Anal. Mach. Intell. 47(3): 2212-2231 (2025)
 [i37]Muru Zhang, Mayank Mishra, Zhongzhu Zhou, William Brandon, Jue Wang, Yoon Kim, Jonathan Ragan-Kelley, Shuaiwen Leon Song, Ben Athiwaratkun, Tri Dao: [i37]Muru Zhang, Mayank Mishra, Zhongzhu Zhou, William Brandon, Jue Wang, Yoon Kim, Jonathan Ragan-Kelley, Shuaiwen Leon Song, Ben Athiwaratkun, Tri Dao:
 Ladder-residual: parallelism-aware architecture for accelerating large model inference with communication overlapping. CoRR abs/2501.06589 (2025)
 [i36]Junlin Wang, Shang Zhu, Jon Saad-Falcon, Ben Athiwaratkun, Qingyang Wu, Jue Wang, Shuaiwen Leon Song, Ce Zhang, Bhuwan Dhingra, James Zou: [i36]Junlin Wang, Shang Zhu, Jon Saad-Falcon, Ben Athiwaratkun, Qingyang Wu, Jue Wang, Shuaiwen Leon Song, Ce Zhang, Bhuwan Dhingra, James Zou:
 Think Deep, Think Fast: Investigating Efficiency of Verifier-free Inference-time-scaling Methods. CoRR abs/2504.14047 (2025)
 [i35]Rahul Thapa, Andrew Li, Qingyang Wu, Bryan He, Yuki Sahashi, Christina Binder, Angela Zhang, Ben Athiwaratkun, Shuaiwen Leon Song, David Ouyang, James Zou: [i35]Rahul Thapa, Andrew Li, Qingyang Wu, Bryan He, Yuki Sahashi, Christina Binder, Angela Zhang, Ben Athiwaratkun, Shuaiwen Leon Song, David Ouyang, James Zou:
 How Well Can General Vision-Language Models Learn Medicine By Watching Public Educational Videos? CoRR abs/2504.14391 (2025)
 [i34]Junlin Wang, Roy Xie, Shang Zhu, Jue Wang, Ben Athiwaratkun, Bhuwan Dhingra, Shuaiwen Leon Song, Ce Zhang, James Zou: [i34]Junlin Wang, Roy Xie, Shang Zhu, Jue Wang, Ben Athiwaratkun, Bhuwan Dhingra, Shuaiwen Leon Song, Ce Zhang, James Zou:
 Improving Model Alignment Through Collective Intelligence of Open-Source LLMS. CoRR abs/2505.03059 (2025)
 [i33]Rahul Thapa, Qingyang Wu, Kevin Wu, Harrison Zhang, Angela Zhang, Eric Wu, Haotian Ye, Suhana Bedi, Nevin Aresh, Joseph Boen, Shriya Reddy, Ben Athiwaratkun, Shuaiwen Leon Song, James Zou: [i33]Rahul Thapa, Qingyang Wu, Kevin Wu, Harrison Zhang, Angela Zhang, Eric Wu, Haotian Ye, Suhana Bedi, Nevin Aresh, Joseph Boen, Shriya Reddy, Ben Athiwaratkun, Shuaiwen Leon Song, James Zou:
 Disentangling Reasoning and Knowledge in Medical Large Language Models. CoRR abs/2505.11462 (2025)
 [i32]Berkan Dokmeci, Qingyang Wu, Ben Athiwaratkun, Ce Zhang, Shuaiwen Leon Song, James Zou: [i32]Berkan Dokmeci, Qingyang Wu, Ben Athiwaratkun, Ce Zhang, Shuaiwen Leon Song, James Zou:
 Data Diversification Methods In Alignment Enhance Math Performance In LLMs. CoRR abs/2507.02173 (2025)
 [i31]Zhongzhu Zhou, Yibo Yang, Ziyan Chen, Fengxiang Bie, Haojun Xia, Xiaoxia Wu, Robert Wu, Ben Athiwaratkun, Bernard Ghanem, Shuaiwen Leon Song: [i31]Zhongzhu Zhou, Yibo Yang, Ziyan Chen, Fengxiang Bie, Haojun Xia, Xiaoxia Wu, Robert Wu, Ben Athiwaratkun, Bernard Ghanem, Shuaiwen Leon Song:
 Imitate Optimal Policy: Prevail and Induce Action Collapse in Policy Gradient. CoRR abs/2509.02737 (2025)
- 2024
 [j23]Chengying Huan [j23]Chengying Huan , Yongchao Liu , Yongchao Liu , Heng Zhang , Heng Zhang , Shuaiwen Song , Shuaiwen Song , Santosh Pandey , Santosh Pandey , Shiyang Chen , Shiyang Chen , Xiangfei Fang , Xiangfei Fang , Yue Jin , Yue Jin , Baptiste Lepers , Baptiste Lepers , Yanjun Wu , Yanjun Wu , Hang Liu , Hang Liu : :
 TEA+: A Novel Temporal Graph Random Walk Engine with Hybrid Storage Architecture. ACM Trans. Archit. Code Optim. 21(2): 37 (2024)
 [j22]Fangtian Zhong [j22]Fangtian Zhong , Xiuzhen Cheng , Xiuzhen Cheng , Dongxiao Yu , Dongxiao Yu , Bei Gong , Bei Gong , Shuaiwen Song, Jiguo Yu , Shuaiwen Song, Jiguo Yu : :
 MalFox: Camouflaged Adversarial Malware Example Generation Based on Conv-GANs Against Black-Box Detectors. IEEE Trans. Computers 73(4): 980-993 (2024)
 [j21]Yufei Yang [j21]Yufei Yang , Chenhao Xie , Chenhao Xie , Liansheng Liu , Liansheng Liu , Philip H. W. Leong , Philip H. W. Leong , Shuaiwen Leon Song , Shuaiwen Leon Song : :
 Efficient Radius Search for Adaptive Foveal Sizing Mechanism in Collaborative Foveated Rendering Framework. IEEE Trans. Mob. Comput. 23(5): 3620-3632 (2024)
 [j20]Chengying Huan [j20]Chengying Huan , Yongchao Liu , Yongchao Liu , Heng Zhang , Heng Zhang , Hang Liu , Hang Liu , Shiyang Chen , Shiyang Chen , Shuaiwen Leon Song , Shuaiwen Leon Song , Yanjun Wu , Yanjun Wu : :
 TeGraph+: Scalable Temporal Graph Processing Enabling Flexible Edge Modifications. IEEE Trans. Parallel Distributed Syst. 35(8): 1469-1487 (2024)
 [c80]Sam Ade Jacobs, Masahiro Tanaka, Chengming Zhang, Minjia Zhang, Reza Yazdani Aminabadi, Shuaiwen Leon Song, Samyam Rajbhandari, Yuxiong He: [c80]Sam Ade Jacobs, Masahiro Tanaka, Chengming Zhang, Minjia Zhang, Reza Yazdani Aminabadi, Shuaiwen Leon Song, Samyam Rajbhandari, Yuxiong He:
 System Optimizations for Enabling Training of Extreme Long Sequence Transformer Models. IPDPS (Workshops) 2024: 1206-1208
 [c79]Yibo Yang, Xiaojie Li, Zhongzhu Zhou, Shuaiwen Song, Jianlong Wu, Liqiang Nie, Bernard Ghanem: [c79]Yibo Yang, Xiaojie Li, Zhongzhu Zhou, Shuaiwen Song, Jianlong Wu, Liqiang Nie, Bernard Ghanem:
 CorDA: Context-Oriented Decomposition Adaptation of Large Language Models for Task-Aware Parameter-Efficient Fine-tuning. NeurIPS 2024
 [c78]Donglin Zhuang, Zhen Zheng, Haojun Xia, Xiafei Qiu, Junjie Bai, Wei Lin, Shuaiwen Leon Song: [c78]Donglin Zhuang, Zhen Zheng, Haojun Xia, Xiafei Qiu, Junjie Bai, Wei Lin, Shuaiwen Leon Song:
 MonoNN: Enabling a New Monolithic Optimization Space for Neural Network Inference Tasks on Modern GPU-Centric Architectures. OSDI 2024: 989-1005
 [c77]Sam Ade Jacobs [c77]Sam Ade Jacobs , Masahiro Tanaka , Masahiro Tanaka , Chengming Zhang , Chengming Zhang , Minjia Zhang , Minjia Zhang , Reza Yazdani Aminadabi , Reza Yazdani Aminadabi , Shuaiwen Leon Song , Shuaiwen Leon Song , Samyam Rajbhandari , Samyam Rajbhandari , Yuxiong He , Yuxiong He : :
 System Optimizations for Enabling Training of Extreme Long Sequence Transformer Models. PODC 2024: 121-130
 [c76]Haojun Xia, Zhen Zheng, Xiaoxia Wu, Shiyang Chen, Zhewei Yao, Stephen Youn, Arash Bakhtiari, Michael Wyatt, Donglin Zhuang, Zhongzhu Zhou, Olatunji Ruwase, Yuxiong He, Shuaiwen Leon Song: [c76]Haojun Xia, Zhen Zheng, Xiaoxia Wu, Shiyang Chen, Zhewei Yao, Stephen Youn, Arash Bakhtiari, Michael Wyatt, Donglin Zhuang, Zhongzhu Zhou, Olatunji Ruwase, Yuxiong He, Shuaiwen Leon Song:
 Quant-LLM: Accelerating the Serving of Large Language Models via FP6-Centric Algorithm-System Co-Design on Modern GPUs. USENIX ATC 2024: 699-713
 [i30]Haojun Xia [i30]Haojun Xia , Zhen Zheng, Xiaoxia Wu, Shiyang Chen, Zhewei Yao, Stephen Youn, Arash Bakhtiari, Michael Wyatt, Donglin Zhuang, Zhongzhu Zhou, Olatunji Ruwase, Yuxiong He, Shuaiwen Leon Song: , Zhen Zheng, Xiaoxia Wu, Shiyang Chen, Zhewei Yao, Stephen Youn, Arash Bakhtiari, Michael Wyatt, Donglin Zhuang, Zhongzhu Zhou, Olatunji Ruwase, Yuxiong He, Shuaiwen Leon Song:
 FP6-LLM: Efficiently Serving Large Language Models Through FP6-Centric Algorithm-System Co-Design. CoRR abs/2401.14112 (2024)
 [i29]Kezhen Chen, Rahul Thapa, Rahul Chalamala, Ben Athiwaratkun, Shuaiwen Leon Song, James Zou: [i29]Kezhen Chen, Rahul Thapa, Rahul Chalamala, Ben Athiwaratkun, Shuaiwen Leon Song, James Zou:
 Dragonfly: Multi-Resolution Zoom Supercharges Large Visual-Language Model. CoRR abs/2406.00977 (2024)
 [i28]Yibo Yang, Xiaojie Li, Zhongzhu Zhou, Shuaiwen Leon Song, Jianlong Wu, Liqiang Nie, Bernard Ghanem: [i28]Yibo Yang, Xiaojie Li, Zhongzhu Zhou, Shuaiwen Leon Song, Jianlong Wu, Liqiang Nie, Bernard Ghanem:
 CorDA: Context-Oriented Decomposition Adaptation of Large Language Models. CoRR abs/2406.05223 (2024)
- 2023
 [j19]Jianda Wang [j19]Jianda Wang , Zhendong Wang , Zhendong Wang , Bo Yu, Jie Tang, Shuaiwen Leon Song, Cong Liu, Yang Hu: , Bo Yu, Jie Tang, Shuaiwen Leon Song, Cong Liu, Yang Hu:
 Data Fusion in Infrastructure-Augmented Autonomous Driving System: Why? Where? and How? IEEE Internet Things J. 10(18): 15857-15871 (2023)
 [j18]Haojun Xia [j18]Haojun Xia , Zhen Zheng, Yuchao Li, Donglin Zhuang, Zhongzhu Zhou, Xiafei Qiu, Yong Li, Wei Lin, Shuaiwen Leon Song: , Zhen Zheng, Yuchao Li, Donglin Zhuang, Zhongzhu Zhou, Xiafei Qiu, Yong Li, Wei Lin, Shuaiwen Leon Song:
 Flash-LLM: Enabling Low-Cost and Highly-Efficient Large Generative Model Inference With Unstructured Sparsity. Proc. VLDB Endow. 17(2): 211-224 (2023)
 [j17]Lening Wang [j17]Lening Wang , Qiyu Wan , Qiyu Wan , Peixun Ma, Jing Wang , Peixun Ma, Jing Wang , Mingsong Chen , Mingsong Chen , Shuaiwen Leon Song, Xin Fu , Shuaiwen Leon Song, Xin Fu : :
 Enabling High-Efficient ReRAM-Based CNN Training Via Exploiting Crossbar-Level Insignificant Writing Elimination. IEEE Trans. Computers 72(11): 3218-3230 (2023)
 [c75]Yue Jin, Chengying Huan, Heng Zhang, Yongchao Liu [c75]Yue Jin, Chengying Huan, Heng Zhang, Yongchao Liu , Shuaiwen Leon Song, Rui Zhao, Yao Zhang, Changhua He, Wenguang Chen: , Shuaiwen Leon Song, Rui Zhao, Yao Zhang, Changhua He, Wenguang Chen:
 G-Sparse: Compiler-Driven Acceleration for Generalized Sparse Computation for Graph Neural Networks on Modern GPUs. PACT 2023: 137-149
 [c74]Chengying Huan [c74]Chengying Huan , Shuaiwen Leon Song , Shuaiwen Leon Song , Santosh Pandey , Santosh Pandey , Hang Liu , Hang Liu , Yongchao Liu , Yongchao Liu , Baptiste Lepers , Baptiste Lepers , Changhua He , Changhua He , Kang Chen , Kang Chen , Jinlei Jiang , Jinlei Jiang , Yongwei Wu , Yongwei Wu : :
 TEA: A General-Purpose Temporal Graph Random Walk Engine. EuroSys 2023: 182-198
 [c73]Yu Wen, Chenhao Xie [c73]Yu Wen, Chenhao Xie , Shuaiwen Leon Song, Xin Fu: , Shuaiwen Leon Song, Xin Fu:
 Post0-VR: Enabling Universal Realistic Rendering for Modern VR via Exploiting Architectural Similarity and Data Sharing. HPCA 2023: 390-402
 [c72]Chengming Zhang [c72]Chengming Zhang , Shaden Smith , Shaden Smith , Baixi Sun , Baixi Sun , Jiannan Tian , Jiannan Tian , Jonathan Soifer , Jonathan Soifer , Xiaodong Yu , Xiaodong Yu , Shuaiwen Leon Song , Shuaiwen Leon Song , Yuxiong He , Yuxiong He , Dingwen Tao , Dingwen Tao : :
 HEAT: A Highly Efficient and Affordable Training System for Collaborative Filtering Based Recommendation on CPUs. ICS 2023: 324-335
 [c71]Qiyu Wan [c71]Qiyu Wan , Lening Wang , Lening Wang , Jing Wang , Jing Wang , Shuaiwen Leon Song , Shuaiwen Leon Song , Xin Fu , Xin Fu : :
 NAS-SE: Designing A Highly-Efficient In-Situ Neural Architecture Search Engine for Large-Scale Deployment. MICRO 2023: 756-768
 [c70]Alan Robertson [c70]Alan Robertson , Shuaiwen Song , Shuaiwen Song : :
 Mitigating Coupling Map Constrained Correlated Measurement Errors on Quantum Devices. SC 2023: 62:1-62:13
 [i27]Chengming Zhang, Shaden Smith, Baixi Sun, Jiannan Tian, Jonathan Soifer, Xiaodong Yu, Shuaiwen Leon Song, Yuxiong He, Dingwen Tao: [i27]Chengming Zhang, Shaden Smith, Baixi Sun, Jiannan Tian, Jonathan Soifer, Xiaodong Yu, Shuaiwen Leon Song, Yuxiong He, Dingwen Tao:
 HEAT: A Highly Efficient and Affordable Training System for Collaborative Filtering Based Recommendation on CPUs. CoRR abs/2304.07334 (2023)
 [i26]Huwan Peng, Scott Davidson, Richard Shi, Shuaiwen Leon Song, Michael B. Taylor: [i26]Huwan Peng, Scott Davidson, Richard Shi, Shuaiwen Leon Song, Michael B. Taylor:
 Chiplet Cloud: Building AI Supercomputers for Serving Large Generative Language Models. CoRR abs/2307.02666 (2023)
 [i25]Zhewei Yao, Reza Yazdani Aminabadi, Olatunji Ruwase, Samyam Rajbhandari, Xiaoxia Wu, Ammar Ahmad Awan, Jeff Rasley, Minjia Zhang, Conglong Li, Connor Holmes, Zhongzhu Zhou, Michael Wyatt, Molly Smith, Lev Kurilenko, Heyang Qin, Masahiro Tanaka, Shuai Che, Shuaiwen Leon Song, Yuxiong He: [i25]Zhewei Yao, Reza Yazdani Aminabadi, Olatunji Ruwase, Samyam Rajbhandari, Xiaoxia Wu, Ammar Ahmad Awan, Jeff Rasley, Minjia Zhang, Conglong Li, Connor Holmes, Zhongzhu Zhou, Michael Wyatt, Molly Smith, Lev Kurilenko, Heyang Qin, Masahiro Tanaka, Shuai Che, Shuaiwen Leon Song, Yuxiong He:
 DeepSpeed-Chat: Easy, Fast and Affordable RLHF Training of ChatGPT-like Models at All Scales. CoRR abs/2308.01320 (2023)
 [i24]Fengxiang Bie, Yibo Yang, Zhongzhu Zhou, Adam Ghanem, Minjia Zhang, Zhewei Yao, Xiaoxia Wu, Connor Holmes, Pareesa Ameneh Golnari, David A. Clifton, Yuxiong He, Dacheng Tao [i24]Fengxiang Bie, Yibo Yang, Zhongzhu Zhou, Adam Ghanem, Minjia Zhang, Zhewei Yao, Xiaoxia Wu, Connor Holmes, Pareesa Ameneh Golnari, David A. Clifton, Yuxiong He, Dacheng Tao , Shuaiwen Leon Song: , Shuaiwen Leon Song:
 RenAIssance: A Survey into AI Text-to-Image Generation in the Era of Large Model. CoRR abs/2309.00810 (2023)
 [i23]Haojun Xia [i23]Haojun Xia , Zhen Zheng, Yuchao Li, Donglin Zhuang, Zhongzhu Zhou, Xiafei Qiu, Yong Li, Wei Lin, Shuaiwen Leon Song: , Zhen Zheng, Yuchao Li, Donglin Zhuang, Zhongzhu Zhou, Xiafei Qiu, Yong Li, Wei Lin, Shuaiwen Leon Song:
 Flash-LLM: Enabling Cost-Effective and Highly-Efficient Large Generative Model Inference with Unstructured Sparsity. CoRR abs/2309.10285 (2023)
 [i22]Sam Ade Jacobs, Masahiro Tanaka, Chengming Zhang, Minjia Zhang, Shuaiwen Leon Song, Samyam Rajbhandari, Yuxiong He: [i22]Sam Ade Jacobs, Masahiro Tanaka, Chengming Zhang, Minjia Zhang, Shuaiwen Leon Song, Samyam Rajbhandari, Yuxiong He:
 DeepSpeed Ulysses: System Optimizations for Enabling Training of Extreme Long Sequence Transformer Models. CoRR abs/2309.14509 (2023)
 [i21]Shuaiwen Leon Song, Bonnie Kruft, Minjia Zhang, Conglong Li, Shiyang Chen, Chengming Zhang, Masahiro Tanaka, Xiaoxia Wu, Jeff Rasley, Ammar Ahmad Awan, Connor Holmes, Martin Cai, Adam Ghanem, Zhongzhu Zhou, Yuxiong He, Pete Luferenko, Divya Kumar, Jonathan A. Weyn, Ruixiong Zhang, Sylwester Klocek, Volodymyr Vragov, Mohammed AlQuraishi, Gustaf Ahdritz, Christina Floristean, Cristina Negri, Rao Kotamarthi, Venkatram Vishwanath, Arvind Ramanathan, Sam Foreman, Kyle Hippe, Troy Arcomano, Romit Maulik [i21]Shuaiwen Leon Song, Bonnie Kruft, Minjia Zhang, Conglong Li, Shiyang Chen, Chengming Zhang, Masahiro Tanaka, Xiaoxia Wu, Jeff Rasley, Ammar Ahmad Awan, Connor Holmes, Martin Cai, Adam Ghanem, Zhongzhu Zhou, Yuxiong He, Pete Luferenko, Divya Kumar, Jonathan A. Weyn, Ruixiong Zhang, Sylwester Klocek, Volodymyr Vragov, Mohammed AlQuraishi, Gustaf Ahdritz, Christina Floristean, Cristina Negri, Rao Kotamarthi, Venkatram Vishwanath, Arvind Ramanathan, Sam Foreman, Kyle Hippe, Troy Arcomano, Romit Maulik , Maxim Zvyagin, Alexander Brace, Bin Zhang, Cindy Orozco Bohorquez, Austin Clyde, Bharat Kale, Danilo Perez-Rivera, Heng Ma, Carla M. Mann , Maxim Zvyagin, Alexander Brace, Bin Zhang, Cindy Orozco Bohorquez, Austin Clyde, Bharat Kale, Danilo Perez-Rivera, Heng Ma, Carla M. Mann , Michael W. Irvin, J. Gregory Pauloski, Logan T. Ward, Valérie Hayot-Sasson, Murali Emani, Zhen Xie, Diangen Lin, Maulik Shukla, Ian T. Foster, James J. Davis, Michael E. Papka, Thomas S. Brettin, Prasanna Balaprakash, Gina Tourassi, John Gounley , Michael W. Irvin, J. Gregory Pauloski, Logan T. Ward, Valérie Hayot-Sasson, Murali Emani, Zhen Xie, Diangen Lin, Maulik Shukla, Ian T. Foster, James J. Davis, Michael E. Papka, Thomas S. Brettin, Prasanna Balaprakash, Gina Tourassi, John Gounley , Heidi A. Hanson, Thomas E. Potok, Massimiliano Lupo Pasini, Kate Evans, Dan Lu, Dalton D. Lunga, Junqi Yin, Sajal Dash, Feiyi Wang, Mallikarjun Shankar, Isaac Lyngaas, Xiao Wang , Heidi A. Hanson, Thomas E. Potok, Massimiliano Lupo Pasini, Kate Evans, Dan Lu, Dalton D. Lunga, Junqi Yin, Sajal Dash, Feiyi Wang, Mallikarjun Shankar, Isaac Lyngaas, Xiao Wang , Guojing Cong, Pei Zhang, Ming Fan, Siyan Liu, Adolfy Hoisie, Shinjae Yoo, Yihui Ren, William Tang, Kyle Felker, Alexey Svyatkovskiy, Hang Liu, Ashwin M. Aji, Angela Dalton, Michael J. Schulte, Karl W. Schulz, Yuntian Deng , Guojing Cong, Pei Zhang, Ming Fan, Siyan Liu, Adolfy Hoisie, Shinjae Yoo, Yihui Ren, William Tang, Kyle Felker, Alexey Svyatkovskiy, Hang Liu, Ashwin M. Aji, Angela Dalton, Michael J. Schulte, Karl W. Schulz, Yuntian Deng , Weili Nie, Josh Romero, Christian Dallago, Arash Vahdat, Chaowei Xiao, Thomas Gibbs, Anima Anandkumar, Rick Stevens: , Weili Nie, Josh Romero, Christian Dallago, Arash Vahdat, Chaowei Xiao, Thomas Gibbs, Anima Anandkumar, Rick Stevens:
 DeepSpeed4Science Initiative: Enabling Large-Scale Scientific Discovery through Sophisticated AI System Technologies. CoRR abs/2310.04610 (2023)
- 2022
 [j16]Yiding Liu [j16]Yiding Liu , Xingyao Zhang , Xingyao Zhang , Donglin Zhuang , Donglin Zhuang , Xin Fu , Xin Fu , Shuaiwen Song , Shuaiwen Song : :
 DynamAP: Architectural Support for Dynamic Graph Traversal on the Automata Processor. ACM Trans. Archit. Code Optim. 19(4): 60:1-60:26 (2022)
 [j15]Jidong Zhai [j15]Jidong Zhai , Liyan Zheng , Liyan Zheng , Feng Zhang , Feng Zhang , Xiongchao Tang, Haojie Wang, Teng Yu , Xiongchao Tang, Haojie Wang, Teng Yu , Yuyang Jin, Shuaiwen Leon Song, Wenguang Chen , Yuyang Jin, Shuaiwen Leon Song, Wenguang Chen : :
 Detecting Performance Variance for Parallel Applications Without Source Code. IEEE Trans. Parallel Distributed Syst. 33(10): 4239-4255 (2022)
 [c69]Chengying Huan, Shuaiwen Leon Song, Yongchao Liu [c69]Chengying Huan, Shuaiwen Leon Song, Yongchao Liu , Heng Zhang, Hang Liu, Charles He, Kang Chen, Jinlei Jiang, Yongwei Wu: , Heng Zhang, Hang Liu, Charles He, Kang Chen, Jinlei Jiang, Yongwei Wu:
 T-GCN: A Sampling Based Streaming Graph Neural Network System with Hybrid Architecture. PACT 2022: 69-82
 [c68]Zhen Zheng, Xuanda Yang, Pengzhan Zhao, Guoping Long, Kai Zhu, Feiwen Zhu, Wenyi Zhao, Xiaoyong Liu, Jun Yang, Jidong Zhai, Shuaiwen Leon Song, Wei Lin: [c68]Zhen Zheng, Xuanda Yang, Pengzhan Zhao, Guoping Long, Kai Zhu, Feiwen Zhu, Wenyi Zhao, Xiaoyong Liu, Jun Yang, Jidong Zhai, Shuaiwen Leon Song, Wei Lin:
 AStitch: enabling a new multi-dimensional optimization space for memory-intensive ML training and inference on modern SIMT architectures. ASPLOS 2022: 359-373
 [c67]Chengying Huan, Hang Liu, Mengxing Liu, Yongchao Liu [c67]Chengying Huan, Hang Liu, Mengxing Liu, Yongchao Liu , Changhua He, Kang Chen, Jinlei Jiang, Yongwei Wu, Shuaiwen Leon Song: , Changhua He, Kang Chen, Jinlei Jiang, Yongwei Wu, Shuaiwen Leon Song:
 TeGraph: A Novel General-Purpose Temporal Graph Computing Engine. ICDE 2022: 578-592
 [c66]Heng Zhang, Lingda Li, Hang Liu, Donglin Zhuang, Rui Liu, Chengying Huan, Shuang Song, Dingwen Tao [c66]Heng Zhang, Lingda Li, Hang Liu, Donglin Zhuang, Rui Liu, Chengying Huan, Shuang Song, Dingwen Tao , Yongchao Liu , Yongchao Liu , Charles He, Yanjun Wu, Shuaiwen Leon Song: , Charles He, Yanjun Wu, Shuaiwen Leon Song:
 Bring orders into uncertainty: enabling efficient uncertain graph processing via novel path sampling on multi-accelerator systems. ICS 2022: 11:1-11:14
 [c65]Donglin Zhuang, Xingyao Zhang, Shuaiwen Song, Sara Hooker: [c65]Donglin Zhuang, Xingyao Zhang, Shuaiwen Song, Sara Hooker:
 Randomness in Neural Network Training: Characterizing the Impact of Tooling. MLSys 2022
 [c64]Liyan Zheng, Jidong Zhai, Xiongchao Tang, Haojie Wang, Teng Yu, Yuyang Jin, Shuaiwen Leon Song, Wenguang Chen: [c64]Liyan Zheng, Jidong Zhai, Xiongchao Tang, Haojie Wang, Teng Yu, Yuyang Jin, Shuaiwen Leon Song, Wenguang Chen:
 Vapro: performance variance detection and diagnosis for production-run parallel applications. PPoPP 2022: 150-162
 [c63]Shaoshan Liu, Jianda Wang, Zhendong Wang, Bo Yu, Wei Hu, Yahui Liu, Jie Tang, Shuaiwen Leon Song, Cong Liu, Yang Hu: [c63]Shaoshan Liu, Jianda Wang, Zhendong Wang, Bo Yu, Wei Hu, Yahui Liu, Jie Tang, Shuaiwen Leon Song, Cong Liu, Yang Hu:
 Brief Industry Paper: The Necessity of Adaptive Data Fusion in Infrastructure-Augmented Autonomous Driving System. RTAS 2022: 293-296
 [i20]Shaoshan Liu, Jianda Wang, Zhendong Wang, Bo Yu, Wei Hu, Yahui Liu, Jie Tang, Shuaiwen Leon Song, Cong Liu, Yang Hu: [i20]Shaoshan Liu, Jianda Wang, Zhendong Wang, Bo Yu, Wei Hu, Yahui Liu, Jie Tang, Shuaiwen Leon Song, Cong Liu, Yang Hu:
 Brief Industry Paper: The Necessity of Adaptive Data Fusion in Infrastructure-Augmented Autonomous Driving System. CoRR abs/2207.00737 (2022)
 [i19]Zhendong Wang, Xiaoming Zeng, Shuaiwen Leon Song, Yang Hu: [i19]Zhendong Wang, Xiaoming Zeng, Shuaiwen Leon Song, Yang Hu:
 Towards Efficient Architecture and Algorithms for Sensor Fusion. CoRR abs/2209.06272 (2022)
 [i18]Jieyang Chen, Chenhao Xie, Jesun Sahariar Firoz, Jiajia Li, Shuaiwen Leon Song, Kevin J. Barker [i18]Jieyang Chen, Chenhao Xie, Jesun Sahariar Firoz, Jiajia Li, Shuaiwen Leon Song, Kevin J. Barker , Mark Raugas, Ang Li: , Mark Raugas, Ang Li:
 MSREP: A Fast yet Light Sparse Matrix Framework for Multi-GPU Systems. CoRR abs/2209.07552 (2022)
- 2021
 [j14]Cody Rivera [j14]Cody Rivera , Jieyang Chen, Nan Xiong, Jing Zhang, Shuaiwen Leon Song, Dingwen Tao , Jieyang Chen, Nan Xiong, Jing Zhang, Shuaiwen Leon Song, Dingwen Tao : :
 TSM2X: High-performance tall-and-skinny matrix-matrix multiplication on GPUs. J. Parallel Distributed Comput. 151: 70-85 (2021)
 [j13]Sian Jin [j13]Sian Jin , Chengming Zhang , Chengming Zhang , Xintong Jiang, Yunhe Feng, Hui Guan, Guanpeng Li, Shuaiwen Song, Dingwen Tao , Xintong Jiang, Yunhe Feng, Hui Guan, Guanpeng Li, Shuaiwen Song, Dingwen Tao : :
 COMET: A Novel Memory-Efficient Deep Learning Training Framework by Using Error-Bounded Lossy Compression. Proc. VLDB Endow. 15(4): 886-899 (2021)
 [j12]Xingyao Zhang [j12]Xingyao Zhang , Xin Fu, Donglin Zhuang , Xin Fu, Donglin Zhuang , Chenhao Xie , Chenhao Xie , Shuaiwen Leon Song: , Shuaiwen Leon Song:
 Enabling Highly Efficient Capsule Networks Processing Through Software-Hardware Co-Design. IEEE Trans. Computers 70(4): 495-510 (2021)
 [c62]Chenhao Xie [c62]Chenhao Xie , Xie Li, Yang Hu, Huwan Peng , Xie Li, Yang Hu, Huwan Peng , Michael B. Taylor, Shuaiwen Leon Song , Michael B. Taylor, Shuaiwen Leon Song : :
 Q-VR: system-level design for future mobile collaborative virtual reality. ASPLOS 2021: 587-599
 [c61]Chenhao Xie [c61]Chenhao Xie , Jieyang Chen , Jieyang Chen , Jesun Firoz, Jiajia Li, Shuaiwen Leon Song, Kevin J. Barker , Jesun Firoz, Jiajia Li, Shuaiwen Leon Song, Kevin J. Barker , Mark Raugas , Mark Raugas , Ang Li: , Ang Li:
 Fast and Scalable Sparse Triangular Solver for Multi-GPU Based HPC Architectures. ICPP 2021: 53:1-53:11
 [c60]Chengming Zhang [c60]Chengming Zhang , Geng Yuan, Wei Niu , Geng Yuan, Wei Niu , Jiannan Tian, Sian Jin, Donglin Zhuang, Zhe Jiang , Jiannan Tian, Sian Jin, Donglin Zhuang, Zhe Jiang , Yanzhi Wang, Bin Ren, Shuaiwen Leon Song, Dingwen Tao , Yanzhi Wang, Bin Ren, Shuaiwen Leon Song, Dingwen Tao : :
 ClickTrain: efficient and accurate end-to-end deep learning training via fine-grained architecture-preserving pruning. ICS 2021: 266-278
 [c59]Xingyao Zhang, Haojun Xia [c59]Xingyao Zhang, Haojun Xia , Donglin Zhuang, Hao Sun, Xin Fu, Michael B. Taylor, Shuaiwen Leon Song , Donglin Zhuang, Hao Sun, Xin Fu, Michael B. Taylor, Shuaiwen Leon Song : :
 η-LSTM: Co-Designing Highly-Efficient Large LSTM Training via Exploiting Memory-Saving and Architectural Design Opportunities. ISCA 2021: 567-580
 [c58]Qiyu Wan, Haojun Xia [c58]Qiyu Wan, Haojun Xia , Xingyao Zhang, Lening Wang, Shuaiwen Leon Song , Xingyao Zhang, Lening Wang, Shuaiwen Leon Song , Xin Fu: , Xin Fu:
 Shift-BNN: Highly-Efficient Probabilistic Bayesian Neural Network Training via Memory-Friendly Pattern Retrieving. MICRO 2021: 885-897
 [c57]Heng Zhang, Lingda Li, Donglin Zhuang, Rui Liu, Shuang Song, Dingwen Tao [c57]Heng Zhang, Lingda Li, Donglin Zhuang, Rui Liu, Shuang Song, Dingwen Tao , Yanjun Wu, Shuaiwen Leon Song: , Yanjun Wu, Shuaiwen Leon Song:
 An efficient uncertain graph processing framework for heterogeneous architectures. PPoPP 2021: 477-479
 [c56]Sian Jin, Guanpeng Li, Shuaiwen Leon Song, Dingwen Tao [c56]Sian Jin, Guanpeng Li, Shuaiwen Leon Song, Dingwen Tao : :
 A novel memory-efficient deep learning training framework via error-bounded lossy compression. PPoPP 2021: 485-487
 [c55]Anil Gaihre, Da Zheng, Scott Weitze, Lingda Li, Shuaiwen Leon Song, Caiwen Ding, Xiaoye S. Li, Hang Liu: [c55]Anil Gaihre, Da Zheng, Scott Weitze, Lingda Li, Shuaiwen Leon Song, Caiwen Ding, Xiaoye S. Li, Hang Liu:
 Dr. Top-k: delegate-centric Top-k on GPUs. SC 2021: 39
 [c54]Kiran Ranganath [c54]Kiran Ranganath , Joshua D. Suetterlein , Joshua D. Suetterlein , Joseph B. Manzano , Joseph B. Manzano , Shuaiwen Leon Song, Daniel Wong , Shuaiwen Leon Song, Daniel Wong : :
 MAPA: multi-accelerator pattern allocation policy for multi-tenant GPU servers. SC 2021: 99
 [c53]Jialiang Tan, Yu Chen, Zhenming Liu, Bin Ren, Shuaiwen Leon Song, Xipeng Shen [c53]Jialiang Tan, Yu Chen, Zhenming Liu, Bin Ren, Shuaiwen Leon Song, Xipeng Shen , Xu Liu: , Xu Liu:
 Toward efficient interactions between Python and native libraries. ESEC/SIGSOFT FSE 2021: 1117-1128
 [i17]Chenhao Xie, Xie Li, Yang Hu, Huwan Peng, Michael B. Taylor, Shuaiwen Leon Song: [i17]Chenhao Xie, Xie Li, Yang Hu, Huwan Peng, Michael B. Taylor, Shuaiwen Leon Song:
 Q-VR: System-Level Design for Future Mobile Collaborative Virtual Reality. CoRR abs/2102.13191 (2021)
 [i16]Donglin Zhuang, Xingyao Zhang, Shuaiwen Leon Song, Sara Hooker: [i16]Donglin Zhuang, Xingyao Zhang, Shuaiwen Leon Song, Sara Hooker:
 Randomness In Neural Network Training: Characterizing The Impact of Tooling. CoRR abs/2106.11872 (2021)
 [i15]Jialiang Tan, Yu Chen, Zhenming Liu, Bin Ren, Shuaiwen Leon Song, Xipeng Shen, Xu Liu: [i15]Jialiang Tan, Yu Chen, Zhenming Liu, Bin Ren, Shuaiwen Leon Song, Xipeng Shen, Xu Liu:
 Toward Efficient Interactions between Python and Native Libraries. CoRR abs/2107.00064 (2021)
 [i14]Anil Gaihre, Da Zheng, Scott Weitze, Lingda Li, Shuaiwen Leon Song, Caiwen Ding, Xiaoye S. Li, Hang Liu: [i14]Anil Gaihre, Da Zheng, Scott Weitze, Lingda Li, Shuaiwen Leon Song, Caiwen Ding, Xiaoye S. Li, Hang Liu:
 Dr. Top-k: Delegate-Centric Top-k on GPUs. CoRR abs/2109.08219 (2021)
 [i13]Kiran Ranganath, Joshua D. Suetterlein, Joseph B. Manzano, Shuaiwen Leon Song, Daniel Wong: [i13]Kiran Ranganath, Joshua D. Suetterlein, Joseph B. Manzano, Shuaiwen Leon Song, Daniel Wong:
 MAPA: Multi-Accelerator Pattern Allocation Policy for Multi-Tenant GPU Servers. CoRR abs/2110.03214 (2021)
 [i12]Qiyu Wan, Haojun Xia, Xingyao Zhang, Lening Wang, Shuaiwen Leon Song, Xin Fu: [i12]Qiyu Wan, Haojun Xia, Xingyao Zhang, Lening Wang, Shuaiwen Leon Song, Xin Fu:
 Shift-BNN: Highly-Efficient Probabilistic Bayesian Neural Network Training via Memory-Friendly Pattern Retrieving. CoRR abs/2110.03553 (2021)
 [i11]Sian Jin, Chengming Zhang, Xintong Jiang, Yunhe Feng, Hui Guan, Guanpeng Li, Shuaiwen Leon Song, Dingwen Tao: [i11]Sian Jin, Chengming Zhang, Xintong Jiang, Yunhe Feng, Hui Guan, Guanpeng Li, Shuaiwen Leon Song, Dingwen Tao:
 COMET: A Novel Memory-Efficient Deep Learning Training Framework by Using Error-Bounded Lossy Compression. CoRR abs/2111.09562 (2021)
- 2020
 [j11]Jingweijia Tan [j11]Jingweijia Tan , Kaige Yan, Shuaiwen Leon Song, Xin Fu: , Kaige Yan, Shuaiwen Leon Song, Xin Fu:
 Energy-Efficient GPU L2 Cache Design Using Instruction-Level Data Locality Similarity. ACM Trans. Design Autom. Electr. Syst. 25(6): 52:1-52:18 (2020)
 [j10]Ang Li [j10]Ang Li , Shuaiwen Leon Song, Jieyang Chen , Shuaiwen Leon Song, Jieyang Chen , Jiajia Li , Jiajia Li , Xu Liu , Xu Liu , Nathan R. Tallent , Nathan R. Tallent , Kevin J. Barker , Kevin J. Barker : :
 Evaluating Modern GPU Interconnect: PCIe, NVLink, NV-SLI, NVSwitch and GPUDirect. IEEE Trans. Parallel Distributed Syst. 31(1): 94-110 (2020)
 [c52]Xingyao Zhang [c52]Xingyao Zhang , Shuaiwen Leon Song, Chenhao Xie , Shuaiwen Leon Song, Chenhao Xie , Jing Wang, Weigong Zhang, Xin Fu , Jing Wang, Weigong Zhang, Xin Fu : :
 Enabling Highly Efficient Capsule Networks Processing Through A PIM-Based Architecture Design. HPCA 2020: 542-555
 [i10]Chenhao Xie, Xin Fu, Mingsong Chen, Shuaiwen Leon Song: [i10]Chenhao Xie, Xin Fu, Mingsong Chen, Shuaiwen Leon Song:
 OO-VR: NUMA Friendly Object-Oriented VR Rendering Framework For Future NUMA-Based Multi-GPU Systems. CoRR abs/2001.03537 (2020)
 [i9]Cody Rivera, Jieyang Chen, Nan Xiong, Shuaiwen Leon Song, Dingwen Tao: [i9]Cody Rivera, Jieyang Chen, Nan Xiong, Shuaiwen Leon Song, Dingwen Tao:
 ISM2: Optimizing Irregular-Shaped Matrix-Matrix Multiplication on GPUs. CoRR abs/2002.03258 (2020)
 [i8]Fangtian Zhong, Xiuzhen Cheng, Dongxiao Yu, Bei Gong, Shuaiwen Song, Jiguo Yu: [i8]Fangtian Zhong, Xiuzhen Cheng, Dongxiao Yu, Bei Gong, Shuaiwen Song, Jiguo Yu:
 MalFox: Camouflaged Adversarial Malware Example Generation Based on C-GANs Against Black-Box Detectors. CoRR abs/2011.01509 (2020)
 [i7]Sian Jin, Guanpeng Li, Shuaiwen Leon Song, Dingwen Tao: [i7]Sian Jin, Guanpeng Li, Shuaiwen Leon Song, Dingwen Tao:
 A Novel Memory-Efficient Deep Learning Training Framework via Error-Bounded Lossy Compression. CoRR abs/2011.09017 (2020)
 [i6]Chengming Zhang, Geng Yuan, Wei Niu, Jiannan Tian, Sian Jin, Donglin Zhuang, Zhe Jiang, Yanzhi Wang, Bin Ren, Shuaiwen Leon Song, Dingwen Tao: [i6]Chengming Zhang, Geng Yuan, Wei Niu, Jiannan Tian, Sian Jin, Donglin Zhuang, Zhe Jiang, Yanzhi Wang, Bin Ren, Shuaiwen Leon Song, Dingwen Tao:
 An Efficient End-to-End Deep Learning Training Framework via Fine-Grained Pattern-Based Pruning. CoRR abs/2011.10170 (2020)
 [i5]Chenhao Xie, Jieyang Chen, Jesun Sahariar Firoz, Jiajia Li, Shuaiwen Leon Song, Kevin J. Barker, Mark Raugas, Ang Li: [i5]Chenhao Xie, Jieyang Chen, Jesun Sahariar Firoz, Jiajia Li, Shuaiwen Leon Song, Kevin J. Barker, Mark Raugas, Ang Li:
 Fast and Scalable Sparse Triangular Solver for Multi-GPU Based HPC Architectures. CoRR abs/2012.06959 (2020)
2010 – 2019
- 2019
 [j9]Kiran Ranganath [j9]Kiran Ranganath , AmirAli Abdolrashidi , AmirAli Abdolrashidi , Shuaiwen Leon Song, Daniel Wong , Shuaiwen Leon Song, Daniel Wong : :
 Speeding up Collective Communications Through Inter-GPU Re-Routing. IEEE Comput. Archit. Lett. 18(2): 128-131 (2019)
 [c51]Tong Geng, Tianqi Wang, Chunshu Wu, Chen Yang, Shuaiwen Leon Song, Ang Li, Martin C. Herbordt: [c51]Tong Geng, Tianqi Wang, Chunshu Wu, Chen Yang, Shuaiwen Leon Song, Ang Li, Martin C. Herbordt:
 LP-BNN: Ultra-low-Latency BNN Inference with Layer Parallelism. ASAP 2019: 9-16
 [c50]Jingweijia Tan, Kaige Yan, Shuaiwen Leon Song, Xin Fu: [c50]Jingweijia Tan, Kaige Yan, Shuaiwen Leon Song, Xin Fu:
 LoSCache: Leveraging Locality Similarity to Build Energy-Efficient GPU L2 Cache. DATE 2019: 1190-1195
 [c49]Chenhao Xie [c49]Chenhao Xie , Xingyao Zhang , Xingyao Zhang , Ang Li, Xin Fu , Ang Li, Xin Fu , Shuaiwen Song: , Shuaiwen Song:
 PIM-VR: Erasing Motion Anomalies In Highly-Interactive Virtual Reality World with Customized Memory Cube. HPCA 2019: 609-622
 [c48]Chenhao Xie [c48]Chenhao Xie , Xin Fu , Xin Fu , Mingsong Chen, Shuaiwen Leon Song: , Mingsong Chen, Shuaiwen Leon Song:
 OO-VR: NUMA friendly object-oriented VR rendering framework for future NUMA-based multi-GPU systems. ISCA 2019: 53-65
 [c47]Ang Li, Tong Geng, Tianqi Wang, Martin C. Herbordt, Shuaiwen Leon Song, Kevin J. Barker [c47]Ang Li, Tong Geng, Tianqi Wang, Martin C. Herbordt, Shuaiwen Leon Song, Kevin J. Barker : :
 BSTC: a novel binarized-soft-tensor-core design for accelerating bit-based approximated neural nets. SC 2019: 38:1-38:30
 [i4]Ang Li, Shuaiwen Leon Song, Jieyang Chen, Jiajia Li, Xu Liu, Nathan R. Tallent, Kevin J. Barker: [i4]Ang Li, Shuaiwen Leon Song, Jieyang Chen, Jiajia Li, Xu Liu, Nathan R. Tallent, Kevin J. Barker:
 Evaluating Modern GPU Interconnect: PCIe, NVLink, NV-SLI, NVSwitch and GPUDirect. CoRR abs/1903.04611 (2019)
 [i3]Xingyao Zhang, Shuaiwen Leon Song, Chenhao Xie, Jing Wang, Weigong Zhang, Xin Fu: [i3]Xingyao Zhang, Shuaiwen Leon Song, Chenhao Xie, Jing Wang, Weigong Zhang, Xin Fu:
 Enabling Highly Efficient Capsule Networks Processing Through A PIM-Based Architecture Design. CoRR abs/1911.03451 (2019)
- 2018
 [j8]Probir Roy [j8]Probir Roy , Shuaiwen Leon Song, Sriram Krishnamoorthy, Abhinav Vishnu, Dipanjan Sengupta, Xu Liu: , Shuaiwen Leon Song, Sriram Krishnamoorthy, Abhinav Vishnu, Dipanjan Sengupta, Xu Liu:
 NUMA-Caffe: NUMA-Aware Deep Learning Neural Networks. ACM Trans. Archit. Code Optim. 15(2): 24:1-24:26 (2018)
 [c46]Probir Roy, Shuaiwen Leon Song, Sriram Krishnamoorthy [c46]Probir Roy, Shuaiwen Leon Song, Sriram Krishnamoorthy , Xu Liu: , Xu Liu:
 Lightweight detection of cache conflicts. CGO 2018: 200-213
 [c45]Du Shen, Shuaiwen Leon Song, Ang Li, Xu Liu: [c45]Du Shen, Shuaiwen Leon Song, Ang Li, Xu Liu:
 CUDAAdvisor: LLVM-based runtime profiling for modern GPUs. CGO 2018: 214-227
 [c44]Chenhao Xie [c44]Chenhao Xie , Xin Fu , Xin Fu , Shuaiwen Song: , Shuaiwen Song:
 Perception-Oriented 3D Rendering Approximation for Modern Graphics Processors. HPCA 2018: 362-374
 [c43]Ang Li, Weifeng Liu [c43]Ang Li, Weifeng Liu , Linnan Wang, Kevin J. Barker , Linnan Wang, Kevin J. Barker , Shuaiwen Leon Song: , Shuaiwen Leon Song:
 Warp-Consolidation: A Novel Execution Model for GPUs. ICS 2018: 53-64
 [c42]Ang Li, Shuaiwen Leon Song, Jieyang Chen [c42]Ang Li, Shuaiwen Leon Song, Jieyang Chen , Xu Liu, Nathan R. Tallent , Xu Liu, Nathan R. Tallent , Kevin J. Barker , Kevin J. Barker : :
 Tartan: Evaluating Modern GPU Interconnect via a Multi-GPU Benchmark Suite. IISWC 2018: 191-202
 [c41]Shuaiwen Leon Song, Natalie J. Bates, Ang Li: [c41]Shuaiwen Leon Song, Natalie J. Bates, Ang Li:
 Introduction to HPPAC 2018. IPDPS Workshops 2018: 674
 [c40]Linnan Wang, Jinmian Ye, Yiyang Zhao, Wei Wu [c40]Linnan Wang, Jinmian Ye, Yiyang Zhao, Wei Wu , Ang Li, Shuaiwen Leon Song, Zenglin Xu, Tim Kraska: , Ang Li, Shuaiwen Leon Song, Zenglin Xu, Tim Kraska:
 Superneurons: dynamic GPU memory management for training deep neural networks. PPoPP 2018: 41-53
 [i2]Linnan Wang, Jinmian Ye, Yiyang Zhao, Wei Wu, Ang Li, Shuaiwen Leon Song, Zenglin Xu, Tim Kraska: [i2]Linnan Wang, Jinmian Ye, Yiyang Zhao, Wei Wu, Ang Li, Shuaiwen Leon Song, Zenglin Xu, Tim Kraska:
 SuperNeurons: Dynamic GPU Memory Management for Training Deep Neural Networks. CoRR abs/1801.04380 (2018)
- 2017
 [c39]Ang Li, Shuaiwen Leon Song, Weifeng Liu [c39]Ang Li, Shuaiwen Leon Song, Weifeng Liu , Xu Liu, Akash Kumar , Xu Liu, Akash Kumar , Henk Corporaal: , Henk Corporaal:
 Locality-Aware CTA Clustering for Modern GPUs. ASPLOS 2017: 297-311
 [c38]Chenhao Xie [c38]Chenhao Xie , Shuaiwen Leon Song, Jing Wang, Weigong Zhang, Xin Fu , Shuaiwen Leon Song, Jing Wang, Weigong Zhang, Xin Fu : :
 Processing-in-Memory Enabled Graphics Processors for 3D Rendering. HPCA 2017: 637-648
 [c37]Junqiao Qiu [c37]Junqiao Qiu , Zhijia Zhao, Bo Wu, Abhinav Vishnu, Shuaiwen Leon Song: , Zhijia Zhao, Bo Wu, Abhinav Vishnu, Shuaiwen Leon Song:
 Enabling scalability-sensitive speculative parallelization for FSM computations. ICS 2017: 2:1-2:10
 [c36]Shuaiwen Leon Song, Richard W. Vuduc [c36]Shuaiwen Leon Song, Richard W. Vuduc : :
 HPPAC Workshop Introduction. IPDPS Workshops 2017: 952
 [c35]Shuaiwen Leon Song, Torsten Hoefler: [c35]Shuaiwen Leon Song, Torsten Hoefler:
 IPDRM Workshop Introduction. IPDPS Workshops 2017: 1284
 [c34]Ang Li, Wenfeng Zhao, Shuaiwen Leon Song: [c34]Ang Li, Wenfeng Zhao, Shuaiwen Leon Song:
 BVF: enabling significant on-chip power savings via bit-value-favor for throughput processors. MICRO 2017: 532-545
 [c33]Ang Li, Weifeng Liu [c33]Ang Li, Weifeng Liu , Mads Ruben Burgdorff Kristensen , Mads Ruben Burgdorff Kristensen , Brian Vinter, Hao Wang, Kaixi Hou, Andres Marquez , Brian Vinter, Hao Wang, Kaixi Hou, Andres Marquez , Shuaiwen Leon Song: , Shuaiwen Leon Song:
 Exploring and analyzing the real impact of modern on-package memory on HPC scientific kernels. SC 2017: 26
 [c32]Ning Zhang [c32]Ning Zhang , Chuntao Jiang, Xian-He Sun, Shuaiwen Leon Song: , Chuntao Jiang, Xian-He Sun, Shuaiwen Leon Song:
 Evaluating GPGPU Memory Performance Through the C-AMAT Model. MCHPC@SC 2017: 35-39
 [c31]Dipanjan Sengupta, Shuaiwen Leon Song: [c31]Dipanjan Sengupta, Shuaiwen Leon Song:
 EvoGraph: On-the-Fly Efficient Mining of Evolving Graphs on GPU. ISC 2017: 97-119
- 2016
 [j7]Li Tan, Zizhong Chen [j7]Li Tan, Zizhong Chen , Shuaiwen Leon Song: , Shuaiwen Leon Song:
 Scalable Energy Efficiency with Resilience for High Performance Computing Systems: A Quantitative Methodology. ACM Trans. Archit. Code Optim. 12(4): 35:1-35:27 (2016)
 [c30]Jingweijia Tan, Shuaiwen Leon Song, Kaige Yan, Xin Fu, Andrés Márquez [c30]Jingweijia Tan, Shuaiwen Leon Song, Kaige Yan, Xin Fu, Andrés Márquez , Darren J. Kerbyson: , Darren J. Kerbyson:
 Combating the Reliability Challenge of GPU Register File at Low Supply Voltage. PACT 2016: 3-15
 [c29]Ang Li, Shuaiwen Leon Song, Akash Kumar, Eddy Z. Zhang, Daniel G. Chavarría-Miranda, Henk Corporaal: [c29]Ang Li, Shuaiwen Leon Song, Akash Kumar, Eddy Z. Zhang, Daniel G. Chavarría-Miranda, Henk Corporaal:
 Critical points based register-concurrency autotuning for GPUs. DATE 2016: 1273-1278
 [c28]Dingwen Tao [c28]Dingwen Tao , Shuaiwen Leon Song, Sriram Krishnamoorthy, Panruo Wu , Shuaiwen Leon Song, Sriram Krishnamoorthy, Panruo Wu , Xin Liang, Eddy Z. Zhang, Darren J. Kerbyson, Zizhong Chen , Xin Liang, Eddy Z. Zhang, Darren J. Kerbyson, Zizhong Chen : :
 New-Sum: A Novel Online ABFT Scheme For General Iterative Methods. HPDC 2016: 43-55
 [c27]Probir Roy [c27]Probir Roy , Xu Liu, Shuaiwen Leon Song: , Xu Liu, Shuaiwen Leon Song:
 SMT-Aware Instantaneous Footprint Optimization. HPDC 2016: 267-279
 [c26]Ang Li, Shuaiwen Leon Song, Mark Wijtvliet, Akash Kumar [c26]Ang Li, Shuaiwen Leon Song, Mark Wijtvliet, Akash Kumar , Henk Corporaal: , Henk Corporaal:
 SFU-Driven Transparent Approximation Acceleration on GPUs. ICS 2016: 15:1-15:14
 [c25]Lingda Li [c25]Lingda Li , Ari B. Hayes, Shuaiwen Leon Song, Eddy Z. Zhang: , Ari B. Hayes, Shuaiwen Leon Song, Eddy Z. Zhang:
 Tag-Split Cache for Efficient GPGPU Cache Utilization. ICS 2016: 43:1-43:12
 [c24]Ang Li, Shuaiwen Leon Song, Eric Brugel, Akash Kumar, Daniel G. Chavarría-Miranda, Henk Corporaal: [c24]Ang Li, Shuaiwen Leon Song, Eric Brugel, Akash Kumar, Daniel G. Chavarría-Miranda, Henk Corporaal:
 X: A Comprehensive Analytic Model for Parallel Machines. IPDPS 2016: 242-252
 [c23]Barry Rountree, Shuaiwen Leon Song: [c23]Barry Rountree, Shuaiwen Leon Song:
 HPPAC Introduction and Committees. IPDPS Workshops 2016: 1089
 [c22]Shuaiwen Leon Song, Todd Gamblin: [c22]Shuaiwen Leon Song, Todd Gamblin:
 IPDRM Introduction and Committees. IPDPS Workshops 2016: 1726
 [c21]Ari B. Hayes, Lingda Li, Daniel G. Chavarría-Miranda, Shuaiwen Leon Song, Eddy Z. Zhang: [c21]Ari B. Hayes, Lingda Li, Daniel G. Chavarría-Miranda, Shuaiwen Leon Song, Eddy Z. Zhang:
 Orion: A Framework for GPU Occupancy Tuning. Middleware 2016: 18
 [i1]Lingda Li, Ari B. Hayes, Stephen A. Hackler, Eddy Z. Zhang, Mario Szegedy, Shuaiwen Leon Song: [i1]Lingda Li, Ari B. Hayes, Stephen A. Hackler, Eddy Z. Zhang, Mario Szegedy, Shuaiwen Leon Song:
 A Graph-based Model for GPU Caching Problems. CoRR abs/1605.02043 (2016)
- 2015
 [j6]Yang You, Haohuan Fu, Shuaiwen Leon Song, Amanda Peters Randles [j6]Yang You, Haohuan Fu, Shuaiwen Leon Song, Amanda Peters Randles , Darren J. Kerbyson, Andres Marquez , Darren J. Kerbyson, Andres Marquez , Guangwen Yang, Adolfy Hoisie , Guangwen Yang, Adolfy Hoisie : :
 Scaling Support Vector Machines on modern HPC platforms. J. Parallel Distributed Comput. 76: 16-31 (2015)
 [c20]Sunil Shrestha, Joseph B. Manzano [c20]Sunil Shrestha, Joseph B. Manzano , Andrés Márquez , Andrés Márquez , Stéphane Zuckerman, Shuaiwen Song, Guang R. Gao: , Stéphane Zuckerman, Shuaiwen Song, Guang R. Gao:
 Gregarious Data Re-structuring in a Many Core Architecture. HPCC/CSS/ICESS 2015: 712-720
 [c19]Chao Li [c19]Chao Li , Shuaiwen Leon Song, Hongwen Dai, Albert Sidelnik, Siva Kumar Sastry Hari, Huiyang Zhou , Shuaiwen Leon Song, Hongwen Dai, Albert Sidelnik, Siva Kumar Sastry Hari, Huiyang Zhou : :
 Locality-Driven Dynamic GPU Cache Bypassing. ICS 2015: 67-77
 [c18]Dipanjan Sengupta, Kapil Agarwal, Shuaiwen Leon Song, Karsten Schwan: [c18]Dipanjan Sengupta, Kapil Agarwal, Shuaiwen Leon Song, Karsten Schwan:
 GraphReduce: Large-Scale Graph Analytics on Accelerator-Based HPC Systems. IPDPS Workshops 2015: 604-609
 [c17]Li Tan, Shuaiwen Leon Song, Panruo Wu [c17]Li Tan, Shuaiwen Leon Song, Panruo Wu , Zizhong Chen , Zizhong Chen , Rong Ge, Darren J. Kerbyson: , Rong Ge, Darren J. Kerbyson:
 Investigating the Interplay between Energy Efficiency and Resilience in High Performance Computing. IPDPS 2015: 786-796
 [c16]Dipanjan Sengupta, Shuaiwen Leon Song, Kapil Agarwal, Karsten Schwan: [c16]Dipanjan Sengupta, Shuaiwen Leon Song, Kapil Agarwal, Karsten Schwan:
 GraphReduce: processing large-scale graphs on accelerator-based systems. SC 2015: 28:1-28:12
- 2014
 [j5]Yang You, Haohuan Fu, Shuaiwen Leon Song, Maryam Mehri Dehnavi, Lin Gan, Xiaomeng Huang, Guangwen Yang: [j5]Yang You, Haohuan Fu, Shuaiwen Leon Song, Maryam Mehri Dehnavi, Lin Gan, Xiaomeng Huang, Guangwen Yang:
 Evaluating multi-core and many-core architectures through accelerating the three-dimensional Lax-Wendroff correction stencil. Int. J. High Perform. Comput. Appl. 28(3): 301-318 (2014)
 [j4]Bo Li, Hung-Ching Chang, Shuaiwen Song, Chun-Yi Su, Timmy Meyer, John Mooring, Kirk W. Cameron [j4]Bo Li, Hung-Ching Chang, Shuaiwen Song, Chun-Yi Su, Timmy Meyer, John Mooring, Kirk W. Cameron : :
 Extending PowerPack for Profiling and Analysis of High-Performance Accelerator-Based Systems. Parallel Process. Lett. 24(4) (2014)
 [c15]Andres Marquez [c15]Andres Marquez , Joseph B. Manzano , Joseph B. Manzano , Shuaiwen Leon Song, Benoît Meister, Sunil Shrestha, Thomas St. John, Guang R. Gao: , Shuaiwen Leon Song, Benoît Meister, Sunil Shrestha, Thomas St. John, Guang R. Gao:
 ACDT: Architected Composite Data Types trading-in unfettered data access for improved execution. ICPADS 2014: 289-297
 [c14]Yang You, Shuaiwen Leon Song, Darren J. Kerbyson: [c14]Yang You, Shuaiwen Leon Song, Darren J. Kerbyson:
 An adaptive cross-architecture combination method for graph traversal. ICS 2014: 169
 [c13]Yang You, Shuaiwen Leon Song, Haohuan Fu, Andres Marquez [c13]Yang You, Shuaiwen Leon Song, Haohuan Fu, Andres Marquez , Maryam Mehri Dehnavi, Kevin J. Barker , Maryam Mehri Dehnavi, Kevin J. Barker , Kirk W. Cameron , Kirk W. Cameron , Amanda Peters Randles , Amanda Peters Randles , Guangwen Yang: , Guangwen Yang:
 MIC-SVM: Designing a Highly Efficient Support Vector Machine for Advanced Modern Multi-core and Many-Core Architectures. IPDPS 2014: 809-818
 [c12]Bo Li, Hung-Ching Chang, Shuaiwen Song, Chun-Yi Su, Timmy Meyer, John Mooring, Kirk W. Cameron [c12]Bo Li, Hung-Ching Chang, Shuaiwen Song, Chun-Yi Su, Timmy Meyer, John Mooring, Kirk W. Cameron : :
 The Power-Performance Tradeoffs of the Intel Xeon Phi on HPC Applications. IPDPS Workshops 2014: 1448-1456
- 2013
 [j3]Abhinav Vishnu, Shuaiwen Song, Andres Marquez [j3]Abhinav Vishnu, Shuaiwen Song, Andres Marquez , Kevin J. Barker , Kevin J. Barker , Darren J. Kerbyson, Kirk W. Cameron , Darren J. Kerbyson, Kirk W. Cameron , Pavan Balaji: , Pavan Balaji:
 Designing energy efficient communication runtime systems: a view from PGAS models. J. Supercomput. 63(3): 691-709 (2013)
 [c11]Bo Li, Shuaiwen Leon Song, Ivona Bezáková [c11]Bo Li, Shuaiwen Leon Song, Ivona Bezáková , Kirk W. Cameron , Kirk W. Cameron : :
 EDR: An energy-aware runtime load distribution system for data-intensive applications in the cloud. CLUSTER 2013: 1-8
 [c10]Shuaiwen Song, Chun-Yi Su, Barry Rountree, Kirk W. Cameron [c10]Shuaiwen Song, Chun-Yi Su, Barry Rountree, Kirk W. Cameron : :
 A Simplified and Accurate Model of Power-Performance Efficiency on Emergent GPU Architectures. IPDPS 2013: 673-686
 [c9]Shuaiwen Leon Song, Kevin J. Barker [c9]Shuaiwen Leon Song, Kevin J. Barker , Darren J. Kerbyson: , Darren J. Kerbyson:
 Unified performance and power modeling of scientific workloads. E2SC@SC 2013: 4:1-4:8
- 2012
 [c8]Shuaiwen Song, Kirk W. Cameron [c8]Shuaiwen Song, Kirk W. Cameron : :
 System-level power-performance efficiency modeling for emergent GPU architectures. PACT 2012: 473-474
 [c7]Bo Li, Shuaiwen Song, Ivona Bezáková, Kirk W. Cameron [c7]Bo Li, Shuaiwen Song, Ivona Bezáková, Kirk W. Cameron : :
 Energy-Aware Replica Selection for Data-Intensive Services in Cloud. MASCOTS 2012: 504-506
 [c6]Shuaiwen Leon Song, Chun-Yi Su, Barry Rountree, Kirk W. Cameron [c6]Shuaiwen Leon Song, Chun-Yi Su, Barry Rountree, Kirk W. Cameron : :
 Abstract: Three Steps to Model Power-Performance Efficiency for Emergent GPU-Based Parallel Systems. SC Companion 2012: 1344-1345
 [c5]Shuaiwen Leon Song: [c5]Shuaiwen Leon Song:
 Poster: Three Steps to Model Power-Performance Efficiency for Emergent GPU-Based Parallel Systems. SC Companion 2012: 1346
- 2011
 [c4]Shuaiwen Song, Matthew Grove, Kirk W. Cameron [c4]Shuaiwen Song, Matthew Grove, Kirk W. Cameron : :
 An ISO-Energy-Efficient Approach to Scalable System Power-Performance Optimization. CLUSTER 2011: 262-271
 [c3]Shuaiwen Song, Chun-Yi Su, Rong Ge, Abhinav Vishnu, Kirk W. Cameron [c3]Shuaiwen Song, Chun-Yi Su, Rong Ge, Abhinav Vishnu, Kirk W. Cameron : :
 Iso-Energy-Efficiency: An Approach to Power-Constrained Parallel Computation. IPDPS 2011: 128-139
- 2010
 [j2]Rong Ge, Xizhou Feng, Shuaiwen Song, Hung-Ching Chang, Dong Li, Kirk W. Cameron [j2]Rong Ge, Xizhou Feng, Shuaiwen Song, Hung-Ching Chang, Dong Li, Kirk W. Cameron : :
 PowerPack: Energy Profiling and Analysis of High-Performance Systems and Applications. IEEE Trans. Parallel Distributed Syst. 21(5): 658-671 (2010)
 [c2]Abhinav Vishnu, Shuaiwen Song, Andres Marquez [c2]Abhinav Vishnu, Shuaiwen Song, Andres Marquez , Kevin J. Barker , Kevin J. Barker , Darren J. Kerbyson, Kirk W. Cameron , Darren J. Kerbyson, Kirk W. Cameron , Pavan Balaji: , Pavan Balaji:
 Designing Energy Efficient Communication Runtime Systems for Data Centric Programming Models. GreenCom/CPSCom 2010: 229-236
 [c1]Abhinav Vishnu, Huub J. J. Van Dam [c1]Abhinav Vishnu, Huub J. J. Van Dam , Wibe de Jong , Wibe de Jong , Pavan Balaji, Shuaiwen Song: , Pavan Balaji, Shuaiwen Song:
 Fault-tolerant communication runtime support for data-centric programming models. HiPC 2010: 1-9
2000 – 2009
- 2009
 [j1]Shuaiwen Song, Rong Ge, Xizhou Feng, Kirk W. Cameron [j1]Shuaiwen Song, Rong Ge, Xizhou Feng, Kirk W. Cameron : :
 Energy Profiling and Analysis of the HPC Challenge Benchmarks. Int. J. High Perform. Comput. Appl. 23(3): 265-276 (2009)
Coauthor Index

manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.
Unpaywalled article links
Add open access links from  to the list of external document links (if available).
 to the list of external document links (if available).
Privacy notice: By enabling the option above, your browser will contact the API of unpaywall.org to load hyperlinks to open access articles. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Unpaywall privacy policy.
Archived links via Wayback Machine
For web page which are no longer available, try to retrieve content from the  of the Internet Archive (if available).
 of the Internet Archive (if available).
Privacy notice: By enabling the option above, your browser will contact the API of archive.org to check for archived content of web pages that are no longer available. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Internet Archive privacy policy.
Reference lists
Add a list of references from  ,
,  , and
, and  to record detail pages.
 to record detail pages.
load references from crossref.org and opencitations.net
Privacy notice: By enabling the option above, your browser will contact the APIs of crossref.org, opencitations.net, and semanticscholar.org to load article reference information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Crossref privacy policy and the OpenCitations privacy policy, as well as the AI2 Privacy Policy covering Semantic Scholar.
Citation data
Add a list of citing articles from  and
 and  to record detail pages.
 to record detail pages.
load citations from opencitations.net
Privacy notice: By enabling the option above, your browser will contact the API of opencitations.net and semanticscholar.org to load citation information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the OpenCitations privacy policy as well as the AI2 Privacy Policy covering Semantic Scholar.
OpenAlex data
Load additional information about publications from  .
.
Privacy notice: By enabling the option above, your browser will contact the API of openalex.org to load additional information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the information given by OpenAlex.
last updated on 2025-10-28 23:05 CET by the dblp team
 all metadata released as open data under CC0 1.0 license
 all metadata released as open data under CC0 1.0 license
see also: Terms of Use | Privacy Policy | Imprint


 Google
Google Google Scholar
Google Scholar Semantic Scholar
Semantic Scholar Internet Archive Scholar
Internet Archive Scholar CiteSeerX
CiteSeerX ORCID
ORCID







