


default search action
MMM 2025, Nara, Japan - Part IV
- Ichiro Ide

, Ioannis Kompatsiaris
, Changsheng Xu
, Keiji Yanai
, Wei-Ta Chu
, Naoko Nitta, Michael Riegler
, Toshihiko Yamasaki
:
MultiMedia Modeling - 31st International Conference on Multimedia Modeling, MMM 2025, Nara, Japan, January 8-10, 2025, Proceedings, Part IV. Lecture Notes in Computer Science 15523, Springer 2025, ISBN 978-981-96-2070-8
Regular Papers
- Congjian Lu

, Shuwang Zhou
, Ke Shan
, Hongkuan Zhang, Zhaoyang Liu:
SES-Net: Multi-dimensional Spot-Edge-Surface Network for Nuclei Segmentation. 3-15 - Zhuowei Chen, Mengqi Huang, Nan Chen, Zhendong Mao:

Skin-Adapter: Fine-Grained Skin-Color Preservation for Text-to-Image Generation. 16-29 - Yishan Lv, Jing Luo, Boyuan Ju, Xinyu Yang:

Small Tunes Transformer: Exploring Macro and Micro-level Hierarchies for Skeleton-Conditioned Melody Generation. 30-43 - Yongliang Zhang, Jing Liu

:
SMG-Diff: Adversarial Attack Method Based on Semantic Mask-Guided Diffusion. 44-57 - Ding-Chi Chang, Shiou-Chi Li, Jen-Wei Huang:

SPLGAN-TTS: Learning Semantic and Prosody to Enhance the Text-to-Speech Quality of Lightweight GAN Models. 58-70 - Hui Zhao, Na Qi, Qing Zhu, Xiumin Lin:

SSCDUF: Spatial-Spectral Correlation Transformer Based on Deep Unfolding Framework for Hyperspectral Image Reconstruction. 71-84 - Nikhil Sharma

, Changchang Sun, Zhenghao Zhao, Anne Hee Hiong Ngu, Hugo Latapie, Yan Yan:
SSDL: Sensor-to-Skeleton Diffusion Model with Lipschitz Regularization for Human Activity Recognition. 85-99 - Zhiyi Fang, Yi Qian, Xiyue Dai:

Structural Information-Guided Fine-Grained Texture Image Inpainting. 100-113 - Lingyi Lu

, Xin Xu
, Xiao Wang
:
Style Separation and Content Recovery for Generalizable Sketch Re-identification and a New Benchmark. 114-127 - Daniele Bonatto

, Sarah Fachada
, Jaime Sancho
, Eduardo Juárez
, Gauthier Lafruit
, Mehrdad Teratani
:
Synchronization and Calibration of Video Sequences Acquired Using Multiple Plenoptic 2.0 Cameras. 128-140 - Zihao Suo, Shanliang Pan:

Target-Oriented Dynamic Denosing Curriculum Learning for Multimodel Stance Detection. 141-154 - Yizhou Li

, Zihua Liu
, Yusuke Monno
, Masatoshi Okutomi
:
TDM: Temporally-Consistent Diffusion Model for All-in-One Real-World Video Restoration. 155-169 - Shuhei Yamamoto

, Noriko Kando
:
Temporal Closeness for Enhanced Cross-Modal Retrieval of Sensor and Image Data. 170-183 - Bjørn Aslak Juliussen

:
The Right to an Explanation Under the GDPR and the AI Act. 184-197 - Joshua Springer

, Gylfi Þór Guðmundsson, Marcel Kyas
:
Toward Appearance-Based Autonomous Landing Site Identification for Multirotor Drones in Unstructured Environments. 198-211 - Saumya Yadav

, Élise Lincker
, Caroline Huron
, Stéphanie Martin, Camille Guinaudeau
, Shin'ichi Satoh
, Jainendra Shukla
:
Towards Inclusive Education: Multimodal Classification of Textbook Images for Accessibility. 212-225 - Itthisak Phueaksri

, Marc A. Kastner
, Yasutomo Kawanishi
, Takahiro Komamizu
, Ichiro Ide
:
Towards Visual Storytelling by Understanding Narrative Context Through Scene-Graphs. 226-239 - Li Yao, Qianni Huang, Yan Wan:

TPS-YOLO: The Efficient Tiny Person Detection Network Based on Improved YOLOv8 and Model Pruning. 240-252 - Junjian Chen, Xuan Yang:

Uncertainty-Guided Joint Semi-supervised Segmentation and Registration of Cardiac Images. 253-267 - Qian Cao

, Ruihua Song, Xu Chen:
Understanding the Roles of Visual Modality in Multimodal Dialogue: An Empirical Study. 268-282 - Sotirios Papadopoulos, Konstantinos Ioannidis, Stefanos Vrochidis, Ioannis Kompatsiaris, Ioannis Patras:

Vision-Language Pretraining for Variable-Shot Image Classification. 283-297 - Yu Li

, Zhenping Xie
:
Visual Anomaly Detection on Topological Connectivity Under Improved YOLOv8. 298-310 - Takamasa Terada, Masahiro Toyoura:

Wavelet Integrated Convolutional Neural Network for ECG Signal Denoising. 311-324 - Feng Li, Jiusong Luo, Wanjun Xia:

WavFusion: Towards Wav2vec 2.0 Multimodal Speech Emotion Recognition. 325-336 - Weijie Wu, Jun Li, Zhijian Wu, Jianhua Xu:

Zero-Shot Sketch-Based Image Retrieval with Hybrid Information Fusion and Sample Relationship Modeling. 337-350
Special Session: ExpertSUM: Special Session on Expert-Level Text Summarization from Fine-Grained Multimedia Analytics
- Hikaru Tanabe

, Keiji Yanai
:
CalorieVoL: Integrating Volumetric Context Into Multimodal Large Language Models for Image-Based Calorie Estimation. 353-365 - Takumi Fukuzawa

, Kensho Hara
, Hirokatsu Kataoka
, Toru Tamaki
:
Can Masking Background and Object Reduce Static Bias for Zero-Shot Action Recognition? 366-379
Special Session: MLLMA: Special Session on Multimodal Large Language Models and Applications
- Su Li, Liang Wang, Jianye Wang, Ziheng Zhang, Junjun Zhang, Lei Zhang

:
Enhanced Anomaly Detection in 3D Motion Through Language-Inspired Occlusion-Aware Modeling. 383-397 - Khanh-An C. Quan

, Camille Guinaudeau
, Shin'ichi Satoh
:
Evaluating VQA Models' Consistency in the Scientific Domain. 398-412 - Jia-Hong Huang, Hongyi Zhu, Yixian Shen

, Stevan Rudinac, Evangelos Kanoulas:
Image2Text2Image: A Novel Framework for Label-Free Evaluation of Image-to-Text Generation with Text-to-Image Diffusion Models. 413-427 - Chihaya Matsuhira

, Marc A. Kastner
, Takahiro Komamizu
, Takatsugu Hirayama
, Ichiro Ide
:
Quantifying Image-Adjective Associations by Leveraging Large-Scale Pretrained Models. 428-441 - Wei Wei, Bingkun Zhang, Yibing Wang:

TACST: Time-Aware Transformer for Robust Speech Emotion Recognition. 442-453 - Wei Wei, Bingkun Zhang, Yibing Wang:

TS-MEFM: A New Multimodal Speech Emotion Recognition Network Based on Speech and Text Fusion. 454-467

manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.


Google
Google Scholar
Semantic Scholar
Internet Archive Scholar
CiteSeerX
ORCID














