default search action
Shengen Yan
Person information
Refine list
refinements active!
zoomed in on ?? of ?? records
view refined list in
export refined list as
2020 – today
- 2024
- [j9]Jiangfei Duan, Xiuhong Li, Ping Xu, Xingcheng Zhang, Shengen Yan, Yun Liang, Dahua Lin:
Proteus: Simulating the Performance of Distributed DNN Training. IEEE Trans. Parallel Distributed Syst. 35(10): 1867-1878 (2024) - [c21]Shiyao Li, Xuefei Ning, Luning Wang, Tengxuan Liu, Xiangsheng Shi, Shengen Yan, Guohao Dai, Huazhong Yang, Yu Wang:
Evaluating Quantized Large Language Models. ICML 2024 - [i15]Tao Yuan, Xuefei Ning, Dong Zhou, Zhijie Yang, Shiyao Li, Minghui Zhuang, Zheyue Tan, Zhuyu Yao, Dahua Lin, Boxun Li, Guohao Dai, Shengen Yan, Yu Wang:
LV-Eval: A Balanced Long-Context Benchmark with 5 Length Levels Up to 256K. CoRR abs/2402.05136 (2024) - [i14]Shiyao Li, Xuefei Ning, Luning Wang, Tengxuan Liu, Xiangsheng Shi, Shengen Yan, Guohao Dai, Huazhong Yang, Yu Wang:
Evaluating Quantized Large Language Models. CoRR abs/2402.18158 (2024) - [i13]Enshu Liu, Junyi Zhu, Zinan Lin, Xuefei Ning, Matthew B. Blaschko, Sergey Yekhanin, Shengen Yan, Guohao Dai, Huazhong Yang, Yu Wang:
Linear Combination of Saved Checkpoints Makes Consistency and Diffusion Models Better. CoRR abs/2404.02241 (2024) - [i12]Zixuan Zhou, Xuefei Ning, Ke Hong, Tianyu Fu, Jiaming Xu, Shiyao Li, Yuming Lou, Luning Wang, Zhihang Yuan, Xiuhong Li, Shengen Yan, Guohao Dai, Xiao-Ping Zhang, Yuhan Dong, Yu Wang:
A Survey on Efficient Inference for Large Language Models. CoRR abs/2404.14294 (2024) - [i11]Si Xu, Zixiao Huang, Yan Zeng, Shengen Yan, Xuefei Ning, Haolin Ye, Sipei Gu, Chunsheng Shui, Zhezheng Lin, Hao Zhang, Sheng Wang, Guohao Dai, Yu Wang:
HetHub: A Heterogeneous distributed hybrid training system for large-scale models. CoRR abs/2405.16256 (2024) - [i10]Tianchen Zhao, Xuefei Ning, Tongcheng Fang, Enshu Liu, Guyue Huang, Zinan Lin, Shengen Yan, Guohao Dai, Yu Wang:
MixDQ: Memory-Efficient Few-Step Text-to-Image Diffusion Models with Metric-Decoupled Mixed Precision Quantization. CoRR abs/2405.17873 (2024) - [i9]Tianchen Zhao, Tongcheng Fang, Enshu Liu, Rui Wan, Widyadewi Soedarmadji, Shiyao Li, Zinan Lin, Guohao Dai, Shengen Yan, Huazhong Yang, Xuefei Ning, Yu Wang:
ViDiT-Q: Efficient and Accurate Quantization of Diffusion Transformers for Image and Video Generation. CoRR abs/2406.02540 (2024) - [i8]Zhihang Yuan, Pu Lu, Hanling Zhang, Xuefei Ning, Linfeng Zhang, Tianchen Zhao, Shengen Yan, Guohao Dai, Yu Wang:
DiTFastAttn: Attention Compression for Diffusion Transformer Models. CoRR abs/2406.08552 (2024) - [i7]Tianyu Fu, Haofeng Huang, Xuefei Ning, Genghan Zhang, Boju Chen, Tianqi Wu, Hongyi Wang, Zixiao Huang, Shiyao Li, Shengen Yan, Guohao Dai, Huazhong Yang, Yu Wang:
MoA: Mixture of Sparse Attention for Automatic Large Language Model Compression. CoRR abs/2406.14909 (2024) - [i6]Enshu Liu, Junyi Zhu, Zinan Lin, Xuefei Ning, Matthew B. Blaschko, Shengen Yan, Guohao Dai, Huazhong Yang, Yu Wang:
Efficient Expert Pruning for Sparse Mixture-of-Experts Language Models: Enhancing Performance and Reducing Inference Costs. CoRR abs/2407.00945 (2024) - 2023
- [c20]Size Zheng, Siyuan Chen, Peidi Song, Renze Chen, Xiuhong Li, Shengen Yan, Dahua Lin, Jingwen Leng, Yun Liang:
Chimera: An Analytical Optimizing Framework for Effective Compute-intensive Operators Fusion. HPCA 2023: 1113-1126 - [i5]Jiangfei Duan, Xiuhong Li, Ping Xu, Xingcheng Zhang, Shengen Yan, Yun Liang, Dahua Lin:
Proteus: Simulating the Performance of Distributed DNN Training. CoRR abs/2306.02267 (2023) - 2022
- [j8]Peng Sun, Yonggang Wen, Ruobing Han, Wansen Feng, Shengen Yan:
GradientFlow: Optimizing Network Performance for Large-Scale Distributed DNN Training. IEEE Trans. Big Data 8(2): 495-507 (2022) - [j7]Lipeng Wang, Qiong Luo, Shengen Yan:
DIESEL+: Accelerating Distributed Deep Learning Tasks on Image Datasets. IEEE Trans. Parallel Distributed Syst. 33(5): 1173-1184 (2022) - [j6]Zhisheng Ye, Peng Sun, Wei Gao, Tianwei Zhang, Xiaolin Wang, Shengen Yan, Yingwei Luo:
Astraea: A Fair Deep Learning Scheduler for Multi-Tenant GPU Clusters. IEEE Trans. Parallel Distributed Syst. 33(11): 2781-2793 (2022) - [j5]Size Zheng, Renze Chen, Yicheng Jin, Anjiang Wei, Bingyang Wu, Xiuhong Li, Shengen Yan, Yun Liang:
NeoFlow: A Flexible Framework for Enabling Efficient Compilation for High Performance DNN Training. IEEE Trans. Parallel Distributed Syst. 33(11): 3220-3232 (2022) - [c19]Lijuan Jiang, Ping Xu, Qianchao Zhu, Xiuhong Li, Shengen Yan, Xingcheng Zhang, Dahua Lin, Wenjing Ma, Zhouyang Li, Jun Liu, Jinming Ma, Minxi Jin, Chao Yang:
EasyView: Enabling and Scheduling Tensor Views in Deep Learning Compilers. ICPP 2022: 54:1-54:11 - [c18]Xiuhong Li, Shengen Yan, Lijuan Jiang, Ping Xu, Jinming Ma, Xingcheng Zhang, Dahua Lin:
LongTail-Bench: A Benchmark Suite for Domain-Specific Operators in Deep Learning. IISWC 2022: 282-295 - [c17]Size Zheng, Renze Chen, Anjiang Wei, Yicheng Jin, Qin Han, Liqiang Lu, Bingyang Wu, Xiuhong Li, Shengen Yan, Yun Liang:
AMOS: enabling automatic mapping for tensor computations on spatial accelerators with hardware abstraction. ISCA 2022: 874-887 - [i4]Ruofan Liang, Bingsheng He, Shengen Yan, Peng Sun:
A Simulation Platform for Multi-tenant Machine Learning Services on Thousands of GPUs. CoRR abs/2201.03175 (2022) - 2021
- [c16]Qinghao Hu, Peng Sun, Shengen Yan, Yonggang Wen, Tianwei Zhang:
Characterization and prediction of deep learning workloads in large-scale GPU datacenters. SC 2021: 104 - [i3]Qinghao Hu, Peng Sun, Shengen Yan, Yonggang Wen, Tianwei Zhang:
Characterization and Prediction of Deep Learning Workloads in Large-Scale GPU Datacenters. CoRR abs/2109.01313 (2021) - 2020
- [j4]Liancheng Jia, Yun Liang, Xiuhong Li, Liqiang Lu, Shengen Yan:
Enabling Efficient Fast Convolution Algorithms on GPUs via MegaKernels. IEEE Trans. Computers 69(7): 986-997 (2020) - [j3]Yun Liang, Liqiang Lu, Qingcheng Xiao, Shengen Yan:
Evaluating Fast Algorithms for Convolutional Neural Networks on FPGAs. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 39(4): 857-870 (2020) - [c15]Lei Xie, Jidong Zhai, Baodong Wu, Yuanbo Wang, Xingcheng Zhang, Peng Sun, Shengen Yan:
Elan: Towards Generic and Efficient Elastic Training for Deep Learning. ICDCS 2020: 78-88 - [c14]Lipeng Wang, Qiong Luo, Shengen Yan:
Accelerating Deep Learning Tasks with Optimized GPU-assisted Image Decoding. ICPADS 2020: 274-281 - [c13]Lipeng Wang, Songgao Ye, Baichen Yang, Youyou Lu, Hequan Zhang, Shengen Yan, Qiong Luo:
DIESEL: A Dataset-Based Distributed Storage and Caching System for Large-Scale Deep Learning Training. ICPP 2020: 20:1-20:11
2010 – 2019
- 2019
- [j2]Yiran Zhang, Long Chen, Xiangzhe An, Shengen Yan:
面向GPU计算平台的归约算法的性能优化研究 (Study on Performance Optimization of Reduction Algorithm Targeting GPU Computing Platform). 计算机科学 46(2): 306-314 (2019) - [c12]Xiuhong Li, Yun Liang, Shengen Yan, Liancheng Jia, Yinghan Li:
A coordinated tiling and batching framework for efficient GEMM on GPUs. PPoPP 2019: 229-241 - [i2]Peng Sun, Wansen Feng, Ruobing Han, Shengen Yan, Yonggang Wen:
Optimizing Network Performance for Distributed DNN Training on GPU Clusters: ImageNet/AlexNet Training in 1.5 Minutes. CoRR abs/1902.06855 (2019) - 2017
- [c11]Qingcheng Xiao, Yun Liang, Liqiang Lu, Shengen Yan, Yu-Wing Tai:
Exploring Heterogeneous Algorithms for Accelerating Deep Convolutional Neural Networks on FPGAs. DAC 2017: 62:1-62:6 - [c10]Liqiang Lu, Yun Liang, Qingcheng Xiao, Shengen Yan:
Evaluating Fast Algorithms for Convolutional Neural Networks on FPGAs. FCCM 2017: 101-108 - [c9]Peng Sun, Yonggang Wen, Ta Nguyen Binh Duong, Shengen Yan:
Towards Distributed Machine Learning in Shared Clusters: A Dynamically-Partitioned Approach. SMARTCOMP 2017: 1-6 - [i1]Peng Sun, Yonggang Wen, Ta Nguyen Binh Duong, Shengen Yan:
Towards Distributed Machine Learning in Shared Clusters: A Dynamically-Partitioned Approach. CoRR abs/1704.06738 (2017) - 2016
- [j1]Yunquan Zhang, Shigang Li, Shengen Yan, Huiyang Zhou:
A Cross-Platform SpMV Framework on Many-Core Architectures. ACM Trans. Archit. Code Optim. 13(4): 33:1-33:25 (2016) - [c8]Peng Sun, Yonggang Wen, Ta Nguyen Binh Duong, Shengen Yan:
Timed Dataflow: Reducing Communication Overhead for Distributed Machine Learning Systems. ICPADS 2016: 1110-1117 - 2014
- [c7]Qingqing Dang, Shengen Yan, Ren Wu:
A fast integral image generation algorithm on GPUs. ICPADS 2014: 624-631 - [c6]Chao Li, Yi Yang, Hongwen Dai, Shengen Yan, Frank Mueller, Huiyang Zhou:
Understanding the tradeoffs between software-managed vs. hardware-managed caches in GPUs. ISPASS 2014: 231-242 - [c5]Shengen Yan, Chao Li, Yunquan Zhang, Huiyang Zhou:
yaSpMV: yet another SpMV framework on GPUs. PPoPP 2014: 107-118 - 2013
- [c4]Weiyan Wang, Yunquan Zhang, Guoping Long, Shengen Yan, Haipeng Jia:
CLSIFT: An Optimization Study of the Scale Invariance Feature Transform on GPUs. HPCC/EUC 2013: 93-100 - [c3]Shengen Yan, Guoping Long, Yunquan Zhang:
StreamScan: fast scan algorithms for GPUs without global barrier synchronization. PPoPP 2013: 229-238 - 2012
- [c2]Haipeng Jia, Yunquan Zhang, Guoping Long, Jianliang Xu, Shengen Yan, Yan Li:
GPURoofline: A Model for Guiding Performance Optimizations on GPUs. Euro-Par 2012: 920-932 - [c1]Haipeng Jia, Yunquan Zhang, Guoping Long, Shengen Yan:
An Insightful Program Performance Tuning Chain for GPU Computing. ICA3PP (1) 2012: 502-516
Coauthor Index
manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.
Unpaywalled article links
Add open access links from to the list of external document links (if available).
Privacy notice: By enabling the option above, your browser will contact the API of unpaywall.org to load hyperlinks to open access articles. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Unpaywall privacy policy.
Archived links via Wayback Machine
For web page which are no longer available, try to retrieve content from the of the Internet Archive (if available).
Privacy notice: By enabling the option above, your browser will contact the API of archive.org to check for archived content of web pages that are no longer available. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Internet Archive privacy policy.
Reference lists
Add a list of references from , , and to record detail pages.
load references from crossref.org and opencitations.net
Privacy notice: By enabling the option above, your browser will contact the APIs of crossref.org, opencitations.net, and semanticscholar.org to load article reference information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Crossref privacy policy and the OpenCitations privacy policy, as well as the AI2 Privacy Policy covering Semantic Scholar.
Citation data
Add a list of citing articles from and to record detail pages.
load citations from opencitations.net
Privacy notice: By enabling the option above, your browser will contact the API of opencitations.net and semanticscholar.org to load citation information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the OpenCitations privacy policy as well as the AI2 Privacy Policy covering Semantic Scholar.
OpenAlex data
Load additional information about publications from .
Privacy notice: By enabling the option above, your browser will contact the API of openalex.org to load additional information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the information given by OpenAlex.
last updated on 2024-09-11 00:34 CEST by the dblp team
all metadata released as open data under CC0 1.0 license
see also: Terms of Use | Privacy Policy | Imprint