


default search action
31st Euro-Par 2025: Dresden, Germany - Part II
- Wolfgang E. Nagel, Diana Goehringer

, Pedro C. Diniz
:
Euro-Par 2025: Parallel Processing - 31st European Conference on Parallel and Distributed Processing, Dresden, Germany, August 25-29, 2025, Proceedings, Part II. Lecture Notes in Computer Science 15901, Springer 2026, ISBN 978-3-031-99856-0
Architectures and Accelerators
- Jiangying Xue, Tianyu Xiong, Lingwei Chao, Ruini Xue

:
SimPoint+: More Stable, Accurate and Efficient Program Analysis. 3-17 - Yi Luo, Yaobin Wang

, Qi Wang, Yingchen Song, Huan Wu, Qingfeng Wang
, Jun Huang
:
DCI: An Efficient Workload-Aware Dual-Cache Allocation GNN Inference Acceleration System. 18-32 - Ruimin Shi, Gabin Schieffer

, Maya B. Gokhale
, Pei-Hung Lin
, Hiren D. Patel
, Ivy Peng
:
ARM SVE Unleashed: Performance and Insights Across HPC Applications on Nvidia Grace. 33-47 - Joonyup Kwon

, Jinhyeok Choi, Ngoc-Son Pham
, Sangwon Shin
, Taeweon Suh
:
SkipNZ: Non-zero Value Skipping for Efficient CNN Acceleration. 48-59 - Jiale Dong, Hao Wu, Zihao Wang, Wenqi Lou

, Zhendong Zheng, Lei Gong, Chao Wang, Xuehai Zhou:
CoQMoE: Co-Designed Quantization and Computation Orchestration for Mixture-of-Experts Vision Transformer on FPGA. 60-74 - Yudong Mu

, Zhihua Fan
, Xiaoxia Yao, Wenming Li, Zhiyuan Zhang, Honglie Wang, Xuejun An, Xiaochun Ye:
FDHA: Fusion-Driven Heterogeneous Accelerator for Efficient Diffusion Model Inference. 75-88 - Zhenxuan Xiong, Libo Huang, Ling Yang, Hui Guo, Junhui Wang, Zhong Zheng, Songwen Pei, Gang Chen, Yongwen Wang:

SONet: Towards Practical Online Neural Network for Enhancing Hard-to-Predict Branches. 89-102 - Piyumal Ranawaka

, Per Stenström
:
BATCH-DNN: Adaptive and Dynamic Batching for Multi-DNN Accelerators. 103-117 - Mengyue Xi

, Jingyi He, Xianwei Zhang
:
CacheC: LLM-Based GPU Cache Management to Enhance Kernel Concurrency. 118-131 - Hao Lan, Ziang Zhou, Qi Zhu, Wei Yan, Qinfen Hao, Xiaochun Ye, Yong Liu, Ninghui Sun:

ParTEE: A Framework for Secure Parallel Computing of RISC-V Trusted Execution Environments. 132-145 - Jin Pu, Shengan Zheng, Penghao Sun, Guifeng Wang, Xin Xie, Linpeng Huang:

CSGC: Collaborative File System Garbage Collection with Computational Storage. 146-160 - Zhaoyang Zeng, Yujuan Tan, Jiali Li, Zhuoxin Bai, Jun Liu, Kan Zhong, Duo Liu, Ao Ren:

Cocache: An Accurate and Low-Overhead Dynamic Caching Method for GNNs. 161-174 - Kazi Asifuzzaman

, Aaron R. Young
, Prasanna Date
, Shruti R. Kulkarni
, Narasinga Rao Miniskar
, Matthew J. Marinella
, Jeffrey S. Vetter
:
ReSpike: A Co-Design Framework for Evaluating SNNs on ReRAM-Based Neuromorphic Processors. 175-189
Data analytics, AI, and Computational Science
- Xuanzheng Wang, Shuo Miao, Zihan Zhu, Peng Qu, Youhui Zhang:

AlphaSparseTensor: Discovering Faster Sparse Matrix Multiplication Algorithms on GPUs for LLM Inference. 193-206 - Zhichen Feng, Xin Zhang:

DiffNO: Neural Operator Learning Using Physically Structured Constrained Diffusion Model. 207-220 - Xinjue Zheng, Zhangqiang Ming, Yuchong Hu, Chenxuan Yao, Wenxiang Zhou, Rui Wang, Xun Chen, Dan Feng:

Saving Memory via Residual Reduction for DNN Training with Compressed Communication. 221-235 - Jacob Garby, Philippas Tsigas

:
Interval-Asynchrony: Delimited Intervals of Localised Asynchrony for Fast Parallel SGD. 236-249 - Matyás Brabec

, Jirí Klepl
, Michal Töpfer
, Martin Krulis
:
Tutoring LLM into a Better CUDA Optimizer. 250-263 - Rubayet Rahman Rongon

, Xuechen Zhang:
iAug: Accelerating Augmentation with Importance Sampling in Deep Neural Network Training. 264-277 - Nicolás Hernández

, Pedro A. Toledo
, Vicente Blanco
, Francisco Almeida
:
2:4 Pruning on Edge Devices: Performance, Energy Efficiency and Accuracy. 278-291 - Ao Chen, Guangli Li, Feng Yu, Xueying Wang, Jiacheng Zhao, Huimin Cui, Xiaobing Feng, Jingling Xue:

TopServe: Task-Operator Co-scheduling for Efficient Multi-DNN Inference Serving on GPUs. 292-305 - Sanjif Shanmugavelu, Mathieu Taillefumier, Christopher Culver, Vijay Ganesh, Oscar R. Hernandez, Ada Sedova:

Robustness of Deep Learning Classification to Adversarial Input on GPUs: Asynchronous Parallel Accumulation Is a Source of Vulnerability. 306-320 - Loris Belcastro

, Paolo Ferragina
, Giovanni Manzini
, Fabrizio Marozzo
, Domenico Talia
, Paolo Trunfio
:
Scalable Compression of Massive Data Collections on HPC Systems. 321-334 - Tianyu Guo

, Hande Dong, Yichong Leng, Feng Liu, Cheater Lin, Nong Xiao
, Xianwei Zhang
:
EFIM: Efficient Serving of LLMs for Infilling Tasks with Improved KV Cache Reuse. 335-348 - Cheng Gu, Gang Li, Xuan Zhang, Jiayao Ling, Xiaolong Lin, Zhuoran Song, Jian Cheng, Xiaoyao Liang:

Light-DiT: An Importance-Aware Dynamic Compression Framework for Diffusion Transformers. 349-364 - Sabtain Ahmad, Thomas Schneidergruber

, Ivona Brandic, Johannes Scholz
:
On-Device Federated Learning for Remote Alpine Livestock Monitoring. 365-379

manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.


Google
Google Scholar
Semantic Scholar
Internet Archive Scholar
CiteSeerX
ORCID














