


default search action
MMM 2025, Nara, Japan - Part V
- Ichiro Ide

, Ioannis Kompatsiaris
, Changsheng Xu
, Keiji Yanai
, Wei-Ta Chu
, Naoko Nitta, Michael Riegler
, Toshihiko Yamasaki
:
MultiMedia Modeling - 31st International Conference on Multimedia Modeling, MMM 2025, Nara, Japan, January 8-10, 2025, Proceedings, Part V. Lecture Notes in Computer Science 15524, Springer 2025, ISBN 978-981-96-2073-9
Special Session on Multimedia Research in Robotics
- Jia Yap Lim, John See

, Christian Dondrup
:
Multimodal Engagement Prediction in Human-Robot Interaction Using Transformer Neural Networks. 3-17 - Daichi Yoshihara, Akishige Yuguchi, Seiya Kawano, Takamasa Iio, Koichiro Yoshino:

What Should Autonomous Robots Verbalize and What Should They Not? 18-29
Special Session on Spatial Intelligence in Multimedia Analytics
- Narges Ghasemi

, Seon Ho Kim
, Abdullah Alfarrarjeh
, Cyrus Shahabi
:
Counting Unique Objects in Geo-Tagged Street Images: A Case Study of Homeless Encampments in Los Angeles. 33-46
Special Session on Simulating Edge Computing and Multimodal AI: A Benchmark for Real-World Applications
- Duy-Dong Le, Duy-Thanh Huynh, Pham The Bao:

Correlation-Based Weighted Federated Learning with Multimodal Sensing and Knowledge Distillation: An Application on a Real-World Benchmark Dataset. 49-60 - Dang Vu

, Tien Dang, Quoc-Trung Nguyen
, Tan Pham:
Leveraging Pruning, Quantization and Multi-objective Optimization for an Efficient Deployment of Multi-modal Models. 61-73
Demo Papers
- Onanong Kongmeesub

, Cathal Gurrin
, Dongyun Nie
:
A User Identification and Reading Style Detection System Based on Eye Movement Patterns While Reading. 77-83 - Ibrahim Serouis

, Florence Sèdes
:
AMDA: Advancing Multimedia Data Annotation for Human-Centric Situations. 84-90 - Tetsuro Kitahara, Takuya Tsutsumi, Takaaki Nagoshi, Taizan Suzuki:

An Implementation of Networked JamSketch. 91-97 - Duen-Chian Jheng, Bill Louis Harchan, Berenika Nawoja Kostka de Sztemberg

, Jen-Hao Hsu
, Min-Chun Tien
:
Badminton Footwork Practice via an Immersive Virtual Reality System. 98-104 - Hanna Borgli

, Håkon Kvale Stensland
, Pål Halvorsen
:
Better Image Segmentation with Classification: Guiding Zero-Shot Models Using Class Activation Maps. 105-111 - Yung-Chu Chiang, Zi-Xian Tang, Yi-Ching Luo, Jason S. Chang:

CleverFox: Integrating Visual Mnemonics with AI for Enhanced Language Learning. 112-118 - Ioannis Kontostathis

, Evlampios Apostolidis
, Konstantinos Apostolidis
, Vasileios Mezaris
:
Enhancing User Control in AI-Based Video Summarization for Social Media. 119-126 - Hung-Yao Peng, Zi-Heng Zhong, Cheng-Chih Tsai, Ching-Yeh Chiang, Tse-Yu Pan

:
FencBuddy: Action-Aware Depth Perception Training for Fencing Attacks. 127-133 - Nami Iino

, Akinaru Iino:
Fingering Prediction for Classical Guitar: Dataset Creation and Model Development. 134-141 - Honghui Yuan

, Keiji Yanai
:
KuzushijiFontDiff: Diffusion Model for Japanese Kuzushiji Font Generation. 142-149 - Bohan Li, Xingyi Li, Yangwen Liang, Shuangquan Wang, Kee-Bong Song:

Leveraging Latent Diffusion in 3D Gaussian Splatting for Novel View Synthesis. 150-157 - Wei-Lun Huang, Shintami Chusnul Hidayati

, Tse-Yu Pan
:
Movie Retrieval Systems Using Genre-Guided Multimodal Learning Techniques. 158-164 - Omar Shahbaz Khan

, Aaron Duane
, Hariz Hasnan, Noé Le Blavec, Pierre Ouvrard, Johan Verdon, Laurent d'Orazio
, Constance Thierry
, Björn Þór Jónsson
:
Multi-Dimensional Exploration of Media Collection Metadata. 165-172 - Kelley Lynch, Kyeongmin Rim, Owen King, James Pustejovsky:

Multimodal Interoperability with the CLAMS Platform. 173-179 - Masatoshi Hamanaka

:
Real-Time Visualizer for Turntablist Performance. 180-186 - Yasutomo Kawanishi

, Yutaka Nakamura, Taiken Shintani, Carlos Toshinori Ishi
, Seiya Kawano
, Koichiro Yoshino, Takashi Minato
, Michihiko Minoh:
RoboDJ: Live Commentary Robots System Driven by Physical- and Cyber-World Observations. 187-193 - Honghui Yuan

, Keiji Yanai
:
SceneTextStyler: Editing Text with Style Transformation. 194-201 - Jobin Idiculla Wattaseril, Jürgen Döllner

:
SelectSum: Topic-Based Selective Summarization of Speech-Based Videos. 202-209 - Wenbin Gan

, Minh-Son Dao, Koji Zettsu:
Smart Driving Assistance with Real-Time Risk Assessment and Personalized Driving Coaching to Enhance Road Safety. 210-217 - Jaime B. Fernandez

, Muhammad Intizar Ali
:
System Demo of Modeling Smart University Campus Virtual Environments. 218-224 - Martin Korb, Werner Bailer

:
Training a Segmentation-Based Visual Anonymization Service for Street Scenes. 225-232 - Christian Limberg

, Zhe Zhang, Marc A. Kastner
:
Transformer-Based Audio Generation Conditioned by 2D Latent Maps: A Demonstration. 233-239 - Angel F. Garcia Contreras, Wen-Yu Chang, Seiya Kawano, Yun-Nung Chen, Koichiro Yoshino:

Using Language Models to Generate and Forget the Narrative Memories of an Assistive Robot. 240-247 - Kota Izumi

, Keiji Yanai
:
WaveFontStyler: Font Style Transfer Based on Sound. 248-254
Video Browser Showdown
- Mario Leopold, Klaus Schoeffmann:

DiveXplore at the Video Browser Showdown 2025. 257-263 - Ujjwal Sharma

, Omar Shahbaz Khan
, Stevan Rudinac
, Björn Þór Jónsson
:
Exquisitor at the Video Browser Showdown 2025: Unifying Conversational Search and User Relevance Feedback. 264-271 - Luca Rossetto

, Ralph Gasser
:
Feature-Driven Video Segmentation and Advanced Querying with vitrivr-Engine. 272-277 - Huy M. Le

, Dat Nguyen Tien, Khang Le Duy
, Tuan Nguyen Dang Quang, Nguyen Khanh Toan, Tuyen Nguyen, Binh T. Nguyen:
Fusionista: Fusion of 3-D Information of Video in Retrieval System. 278-285 - Tai Nguyen

, Vo Ngoc Minh Anh, Duc Dat Pham
, Tran Quang Vinh, Nhu Duong Thi Quynh, Le Anh Tien, Tan Duy Le, Binh T. Nguyen:
HORUS: Multimodal Large Language Models Framework for Video Retrieval at VBS 2025. 286-293 - Duc-Tuan Luu

, Khanh-An C. Quan
, Duy-Ngoc Nguyen, Khanh-Linh Bui-Le, Nhat-Sang Doan, Minh-Duc Le-Ngo, Vinh-Tiep Nguyen
, Minh-Triet Tran
:
IMSearch 2.0: Toward User-Centric and Efficient Interactive Multimedia Retrieval System. 294-301 - Yu-Tong Cheng, Jiaxin Wu, Zhixin Ma, Jiangshan He, Xiao-Yong Wei, Chong-Wah Ngo:

Interactive Video Search with Multi-modal LLM Video Captioning. 302-309 - Rahel Arnold

, Rahel Kempf, Raphael Waltenspül
, Heiko Schuldt
:
MediaMix: Multimedia Retrieval in Mixed Reality. 310-317 - Bao Tran Gia, Tuong Bui Cong Khanh, Tam Le Thi Thanh, Thuyen Tran Doan, Khiem Le, Tien Do, Tien-Dung Mai, Thanh Duc Ngo, Duy-Dinh Le, Shin'ichi Satoh:

NII-UIT at VBS2025: Multimodal Video Retrieval with LLM Integration and Dynamic Temporal Search. 318-325 - Michael Stroh

, Vojtech Kloda
, Benjamin Verner, Zuzana Vopálková, Raphael Buchmüller, Bastian Jäckl, Jakub Hajko, Jakub Lokoc:
PraK Tool V3: Enhancing Video Item Search Using Localized Text and Texture Queries. 326-333 - Florian Spiess

, Luca Rossetto
, Heiko Schuldt
:
Simplified Video Retrieval in Virtual Reality with vitrivr-VR. 334-338 - Minh-Quan Ho-Le

, Duy-Khang Ho
, Huy-Hoang Do-Huu
, Nhut-Thanh Le-Hinh
, Hoa-Vien Vo-Hoang
, Van-Tu Ninh
, Cathal Gurrin
, Minh-Triet Tran
:
SnapSeek 2.0 at Video Browser Showdown 2025. 339-346 - Thang-Long Nguyen-Ho

, Viet-Tham Huynh
, Onanong Kongmeesub
, Minh-Triet Tran
, Dongyun Nie
, Graham Healy
, Cathal Gurrin
:
VEAGLE: Eye Gaze-Assisted Guidance for Video Browser Showdown. 347-354 - Nick Pantelidis, Dimitris Georgalis, Maria Pegia, Damianos Galanopoulos, Konstantinos Apostolidis, Klearchos Stavrothanasopoulos, Anastasia Moumtzidou, Konstantinos Gkountakos, Ilias Gialampoukidis, Stefanos Vrochidis, Vasileios Mezaris, Ioannis Kompatsiaris:

VERGE in VBS 2025. 355-362 - Quang-Linh Tran, Binh T. Nguyen, Gareth J. F. Jones, Cathal Gurrin:

VideoEase at VBS2025: An Interactive Video Retrieval System. 363-370 - Gia-Huy Vuong

, Van-Son Ho
, Tien-Thanh Nguyen-Dang
, Xuan-Dang Thai
, Minh-Quan Ho-Le
, Tu-Khiem Le
, Minh-Khoi Pham
, Van-Tu Ninh
, Cathal Gurrin
, Minh-Triet Tran
:
ViewsInsight2.0: Enhancing Video Retrieval for VBS 2025 with an Automatic Query Generator Powered by Large Language Models. 371-377 - Khanh-An C. Quan

, Qui Ngoc Nguyen, Minh-Triet Tran
:
ViFi: A Video Finding System at Video Browser Showdown 2025. 378-384

manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.


Google
Google Scholar
Semantic Scholar
Internet Archive Scholar
CiteSeerX
ORCID














