


default search action
ICMR 2024: Phuket, Thailand
- Cathal Gurrin, Rachada Kongkachandra, Klaus Schoeffmann, Duc-Tien Dang-Nguyen, Luca Rossetto, Shin'ichi Satoh, Liting Zhou:

Proceedings of the 2024 International Conference on Multimedia Retrieval, ICMR 2024, Phuket, Thailand, June 10-14, 2024. ACM 2024
Regular Long Papers
- Xinzhe Ni

, Yong Liu
, Hao Wen
, Yatai Ji
, Jing Xiao
, Yujiu Yang
:
Multimodal Prototype-Enhanced Network for Few-Shot Action Recognition. 1-10 - Kaixing Yang

, Xukun Zhou
, Xulong Tang
, Ran Diao
, Hongyan Liu
, Jun He
, Zhaoxin Fan
:
BeatDance: A Beat-Based Model-Agnostic Contrastive Learning Framework for Music-Dance Retrieval. 11-19 - Yang Xu

, Yifan Feng
, Lin Bie
:
Triadic Elastic Structure Representation for Open-Set Incremental 3D Object Retrieval. 20-28 - Stephan Repp

, Ernst Georg Haffner
:
Dynamic Segmentation for Efficient Retrieval of Podcasts: The Repping Algorithm. 29-36 - Zhaoxin Fan

, Fengxin Li
, Hongyan Liu
, Jun He
, Xiaoyong Du
:
PoseRec: 3D Human Pose Driven Online Advertisement Recommendation for Micro-videos. 37-45 - Xiaoyu Qiu

, Hao Feng
, Yuechen Wang
, Wengang Zhou
, Houqiang Li
:
Progressive Multi-modal Conditional Prompt Tuning. 46-54 - Zhaoxin Fan

, Zhenbo Song
, Zhicheng Wang
, Jian Xu
, Kejian Wu
, Hongyan Liu
, Jun He
:
ACR-Pose: Adversarial Canonical Representation Reconstruction Network for Category Level 6D Object Pose Estimation. 55-63 - Yunfeng Yu

, Longlong Lin
, Qiyu Liu
, Zeli Wang
, Xi Ou
, Tao Jia
:
GSD-GNN: Generalizable and Scalable Algorithms for Decoupled Graph Neural Networks. 64-72 - Jiaxin Wu

, Chong-Wah Ngo
, Wing-Kwong Chan
:
Improving Interpretable Embeddings for Ad-hoc Video Search with Generative Captions and Multi-word Concept Bank. 73-82 - Hua Gao

, Chenchen Hu
, Guang Han
, Jiafa Mao
, Wei Huang
, Kaiyuan Wan
:
HashNeck is a Boosting Tool for Deep Learning to Hashing. 83-91 - Di Wang

, Feng Yan
, Yifeng Wang
, Lin Zhao
, Xiao Liang
, Haodi Zhong
, Ronghua Zhang
:
Fine-grained Semantics-aware Representation Learning for Text-based Person Retrieval. 92-100 - Guangzhe Zhao

, Yanan Liu
, Xueping Wang
, Feihu Yan
:
CMFF-Face: Attention-Based Cross-Modal Feature Fusion for High-Quality Audio-Driven Talking Face Generation. 101-110 - Meng Wei

, Zhongnian Li
, Yong Zhou
, Xinzheng Xu
:
Learning from Reduced Labels for Long-Tailed Data. 111-119 - Tianyi Wang

, Shenghua Zhong
:
Fingerprinting in EEG Model IP Protection Using Diffusion Model. 120-128 - Weixing Liu

, Shenghua Zhong
:
MarginFinger: Controlling Generated Fingerprint Distance to Classification boundary Using Conditional GANs. 129-136 - Chuang Zhao

, Hefei Ling
, Shijie Lu
, Yuxuan Shi
, Jiazhong Chen
, Ping Li
:
Improve Deep Hashing with Language Guidance for Unsupervised Image Retrieval. 137-145 - Yue Yang

, Liangjun Ke
:
Exploiting Degradation Prior for Personalized Federated Learning in Real-World Image Super-Resolution. 146-154 - Hui Liu

, Xiaojun Wan
:
QAVidCap: Enhancing Video Captioning through Question Answering Techniques. 155-164 - Fanlei Meng

, Xiangru Chen
, Yuan Cao
:
Targeted Universal Adversarial Attack on Deep Hash Networks. 165-174 - Feifei Fu

, Yizhao Gao
, Zhiwu Lu
:
Enhancing Class-Incremental Learning for Image Classification via Bidirectional Transport and Selective Momentum. 175-183 - Mingzhe Yu

, Yunshan Ma
, Lei Wu
, Kai Cheng
, Xue Li
, Lei Meng
, Tat-Seng Chua
:
Smart Fitting Room: A One-stop Framework for Matching-aware Virtual Try-On. 184-192 - Mingyue Li

, Yuting Zhu
, Ruizhong Du
, Chunfu Jia
:
Secure Verification Encrypted Image Retrieval Scheme with Addition Homomorphic Bitmap Index. 193-201 - Xingquan Cai, Haoyu Zhang, Shanshan He, Haoyu Song, Haiyan Sun:

A Novel Auxiliary Task Framework in 3D Human Pose Estimation for Opera Videos. 202-210 - Donghuo Zeng

, Yanan Wang
, Kazushi Ikeda
, Yi Yu
:
Anchor-aware Deep Metric Learning for Audio-visual Retrieval. 211-219 - Jiaao Yu

, Yunlai Ding
, Junyu Dong
, Yuezun Li
:
Dynamic Soft Labeling for Visual Semantic Embedding. 220-228 - Feifei Xu

, Ziheng Yu
:
Navigating Style Variations in Scene Text Image Super-Resolution through Multi-Scale Perception. 229-238 - Depei Liu

, Hongjie Fan
, Junfei Liu
:
ExpoGenius: Robust Personalized Human Image Generation using Diffusion Model for Exposure Variation and Pose Transfer. 239-247 - Xudong Ru

, Haichuan Zhao
, Xingce Wang
, Zhongke Wu
, Shaolong Liu
, Yi-Cheng Zhu
, Alejandro F. Frangi
:
Vector-Aware Anisotropic Gauge Equivariant Mesh Convolution Network for 3D Aneurysm Detection. 248-256 - Junming Wang

, Yi Shi
:
NeurNCD: Novel Class Discovery via Implicit Neural Representation. 257-265 - Lin Bie

, Siqi Li
, Kai Cheng
:
Image-to-Point Registration via Cross-Modality Correspondence Retrieval. 266-274 - Lilong Wen

, Xiu Tang
, Dongxiang Zhang
:
TWIST: Text-only Weakly Supervised Scene Text Spotting Using Pseudo Labels. 275-284 - Xintao Jiao

, Jiansheng Chen
, Jiale Liu
:
A Graph Convolution Network with a POS-aware Filter and Context Enhancement Mechanism for Event Detection. 285-292 - Florian Spiess

, Nicolas Scharowski
, Ariane Haller
, Zgjim Memeti
, Heiko Schuldt
, Florian Brühlmann
:
Bringing Video Browsing to Virtual Reality: Empirical Evaluation of a Novel Multimedia Drawer. 293-301 - Changgu Chen

, Yang Li
, Jian Zhang
, Jiali Liu
, Changbo Wang
:
Generative Data Augmentation with Liveness Information Preserving for Face Anti-Spoofing. 302-310 - Lucas Joos

, Bastian Jäckl
, Daniel A. Keim
, Maximilian T. Fischer
, Ladislav Peska
, Jakub Lokoc
:
Known-Item Search in Video: An Eye Tracking-Based Study. 311-319 - Huixia Ben

, Shuo Wang
, Meng Wang
, Richang Hong
:
Pseudo Content Hallucination for Unpaired Image Captioning. 320-329 - Haiyang Zheng

, Ruilin Zhang
, Hongpeng Wang
:
Deep Image Clustering Based on Curriculum Learning and Density Information. 330-338 - Jiaxin Li

, Zhihan Yu
, Guibo Luo
, Yuesheng Zhu
:
CodeDetector: Revealing Forgery Traces with Codebook for Generalized Deepfake Detection. 339-347 - Zeli Wang

, Jian Li, Shuyin Xia
, Longlong Lin
, Guoyin Wang
:
Text Adversarial Defense via Granular-Ball Sample Enhancement. 348-356 - Zeli Wang

, Tuo Zhang
, Shuyin Xia
, Longlong Lin
, Guoyin Wang
:
GBRAIN: Combating Textual Label Noise by Granular-ball based Robust Training. 357-365 - Wei Tang

, Yuanyi Wang
:
Multi-modal Entity Alignment via Position-enhanced Multi-label Propagation. 366-375 - Zuheng Kang

, Yayun He
, Botao Zhao
, Xiaoyang Qu
, Junqing Peng
, Jing Xiao
, Jianzong Wang
:
Retrieval-Augmented Audio Deepfake Detection. 376-384 - Yongcheng Zhang

, Lingou Kong
, Sheng Tian
, Hao Fei
, Changpeng Xiang
, Huan Wang
, Xiaomei Wei
:
Multi-view Counterfactual Contrastive Learning for Fact-checking Fake News Detection. 385-393 - Danyang Hou

, Liang Pang
, Huawei Shen
, Xueqi Cheng
:
Improving Video Corpus Moment Retrieval with Partial Relevance Enhancement. 394-403 - Albatool Wazzan

, Imtiaz Ahmad
, Stephen MacNeil
, Richard Souvenir
:
Context or Clutter? Efficiently Matching Objects Across Scenes. 404-413 - Tianpeng Zhang

, Xuesong Jiang
:
A Lightweight Surface Defect Segmentation Network with External Semantics and High-frequency Information. 414-422 - Zhenghao Zhao

, Hao Tang
, Joy Wan
, Yan Yan
:
Monocular Expressive 3D Human Reconstruction of Multiple People. 423-432 - Mei Yu

, Xiaoxi Zhou
, Mankun Zhao
, Tianyi Xu
, Yue Zhao
, Ruiguo Yu
, Xuewei Li
:
A Causal View for Multi-Interest User Modeling in News Recommendation. 433-441 - Yang Liu

, Tongfei Shen
, Dong Zhang
, Qingying Sun
, Shoushan Li
, Guodong Zhou
:
Comment-aided Video-Language Alignment via Contrastive Pre-training for Short-form Video Humor Detection. 442-450 - Yichen Yan

, Xingjian He
, Sihan Chen
, Jing Liu
:
Calibration & Reconstruction: Deeply Integrated Language for Referring Image Segmentation. 451-459 - Thao-Nhu Nguyen

, Zongyao Li
, Satoshi Yamazaki
, Jianquan Liu
, Cathal Gurrin
:
A Parallel Transformer Framework for Video Moment Retrieval. 460-468 - Pengfei Wei

, Hongjun Ouyang
, Qintai Hu
, Bi Zeng
, Guang Feng
, Qingpeng Wen
:
VEC-MNER: Hybrid Transformer with Visual-Enhanced Cross-Modal Multi-level Interaction for Multimodal NER. 469-477 - Weiwei Zhou

, Guoqiang Xiao
, Michael S. Lew
, Song Wu
:
Causal Inference-based Few-Shot Class-Incremental Learning. 478-487 - Zixin Tang

, Haihui Fan
, Xiaoyan Gu
, Yang Li
, Bo Li
, Xin Wang
:
ELSEIR: A Privacy-Preserving Large-Scale Image Retrieval Framework for Outsourced Data Sharing. 488-496 - Yijing Zhao

, Yuchao Xia
, Yi Ding
, Yumeng Liu
, Shuai Liu
, Hongan Wang
:
S2F-Net: Shared-Specific Fusion Network for Infrared and Visible Image Fusion. 497-505 - Gullal S. Cheema

, Judi Arafat
, Chiao-I Tseng
, John A. Bateman
, Ralph Ewerth
, Eric Müller-Budack
:
Identification of Speaker Roles and Situation Types in News Videos. 506-514 - Tianwei Chen

, Noa Garcia
, Liangzhi Li
, Yuta Nakashima
:
Retrieving Emotional Stimuli in Artworks. 515-523 - Pengfei Wei

, Zhaokang Huang
, Hongjun Ouyang
, Qintai Hu
, Bi Zeng
, Guang Feng
:
CGI-MRE: A Comprehensive Genetic-Inspired Model For Multimodal Relation Extraction. 524-532 - Chenxiao Liu

, Zheyong Xie
, Sirui Zhao
, Jin Zhou
, Tong Xu
, Minglei Li
, Enhong Chen
:
Speak From Heart: An Emotion-Guided LLM-Based Multimodal Method for Emotional Dialogue Generation. 533-542 - Zhirui Kuai

, Yulu Zhou
, Qi Xie
, Li Kuang
:
Multi-Source Augmentation and Composite Prompts for Visual Recognition with Missing Modality. 543-551 - Xiangyu Liu

, Yanlei Shang
, Yong Chen
:
TriMPL: Masked Multi-Prompt Learning with Knowledge Mixing for Vision-Language Few-shot Learning. 552-560 - Zhongnian Li

, Peng Ying
, Meng Wei
, Tongfeng Sun
, Xinzheng Xu
:
Prompt Expending for Single Positive Multi-Label Learning with Global Unannotated Categories. 561-569 - Yaqun Fang

, Yi Shi
, Jia Bei
, Tongwei Ren
:
Semantic-guided RGB-Thermal Crowd Counting with Segment Anything Model. 570-578 - Ruiqi Wu

, Bingliang Jiao
, Wenxuan Wang
, Meng Liu
, Peng Wang
:
Enhancing Visible-Infrared Person Re-identification with Modality- and Instance-aware Visual Prompt Learning. 579-588 - Zhenyu Xie

, Huanyu He
, Gui Zou
, Jie Wu
, Guoliang Liu
, Jun Zhao
, Yingxue Wang
, Hui Lin
, Weiyao Lin
:
Visibility-guided Human Body Reconstruction from Uncalibrated Multi-view Cameras. 589-598 - Yilin Li

, Tszyin Guo
, Ying Qiao
, Zitong Bo
, Hongan Wang
:
FEST: A Multi-way Framework with Enhanced Spatial-Temporal Modeling for Traffic Forecasting. 599-607 - Yuchen Niu

, Min Zhu
, Zhihua Wei:
SamCap: Energy-based Controllable Image Captioning by Gradient-Based Sampling. 608-617 - Zhuoyuan Wei

, Xun Jiang
, Zheng Wang
, Fumin Shen
, Xing Xu
:
PTAN: Principal Token-aware Adjacent Network for Compositional Temporal Grounding. 618-627 - Chao Ye

, Qian Wang
, Lanfang Dong
:
A Hybrid Few-Shot Image Classification Framework Combining Gaussian Modeling and Label Propagation. 628-637 - Shizhou Huang

, Bo Xu
, Changqun Li
, Jiabo Ye
, Xin Lin
:
A Sentimental Prompt Framework with Visual Text Encoder for Multimodal Sentiment Analysis. 638-646 - Zhikai Hu

, Yiu-ming Cheung
, Yonggang Zhang
, Peiying Zhang
, Puiling Tang
:
Component-Level Oracle Bone Inscription Retrieval. 647-656 - Nico Hezel

, Kai Uwe Barthel
, Konstantin Schall
, Klaus Jung
:
An Exploration Graph with Continuous Refinement for Efficient Multimedia Retrieval. 657-665 - Siqi Wei

, Bin Wu
:
Intra and Inter-modality Incongruity Modeling and Adversarial Contrastive Learning for Multimodal Fake News Detection. 666-674 - Kaixing Yang

, Xulong Tang
, Ran Diao
, Hongyan Liu
, Jun He
, Zhaoxin Fan
:
CoDancers: Music-Driven Coherent Group Dance Generation with Choreographic Unit. 675-683 - Yuwen Yang

, Yuxiang Lu
, Suizhi Huang
, Shalayiding Sirejiding
, Hongtao Lu
, Yue Ding
:
Federated Multi-Task Learning on Non-IID Data Silos: An Experimental Study. 684-693 - Xiaoqian Liang

, Jianji Wang
, Yuanliang Lu
, Xubin Duan
, Xichun Liu
, Nanning Zheng
:
Refracting Once is Enough: Neural Radiance Fields for Novel-View Synthesis of Real Refractive Objects. 694-703 - Bo Li

, You Wu
, Zhixin Li
:
Team HUGE: Image-Text Matching via Hierarchical and Unified Graph Enhancing. 704-712 - Peijia Chen

, Ke Qi
, Xi Tao
, Wenhao Xu
, Jingdong Zhang
:
MFVG: A Visual Grounding Network with Multi-scale Fusion. 713-721 - Zhijian Wu

, Wenhui Liu
, Dingjiang Huang
:
When Handcrafted Filter Meets CNN: A Lightweight Conv-Filter Mixer Network for Efficient Image Super-Resolution. 722-730 - Dahuang Liu

, Jiuxiang You
, Guobo Xie, Lap-Kei Lee
, Fu Lee Wang
, Zhenguo Yang
:
Modality-specific and -shared Contrastive Learning for Sentiment Analysis. 731-739 - Zhuohua Li

, Ruyun Wang
, Fuqing Zhu
, Jizhong Han
, Songlin Hu
:
Pyramidal Cross-Modal Transformer with Sustained Visual Guidance for Multi-Label Image Classification. 740-748 - Xuanhao Qi

, Min Zhi
, Yanjun Yin
, Ping Ping
, Yuening Zhang
:
SFAM: Lightweight Spectrum Unreferenced Attention Network. 749-757 - Ioannis Sarridis, Christos Koutlis, Symeon Papadopoulos, Christos Diou:

FaceX: Understanding Face Attribute Classifiers through Summary Model Explanations. 758-766 - Weipeng Yang

, Hongxia Gao
, Wenbin Zou
, Tongtong Liu
, Shasha Huang
, Jianliang Ma
:
Low-Light Image Enhancement via Weighted Low-Rank Tensor Regularized Retinex Model. 767-775 - Lai Wei

, Shanshan Song
:
Multi-view Subspace Clustering via An Adaptive Consensus Graph Filter. 776-784 - Ruihai Wu

, Yourong Zhang
, Yu Qi
, Andy Guanhong Chen
, Hao Dong
:
Pattern4Ego: Learning Egocentric Video Representation Using Cross-video Activity Patterns. 785-794 - Xigang Bao

, Mengyuan Tian
, Luyao Wang, Zhiyuan Zha
, Biao Qin
:
Contrastive Pre-training with Multi-level Alignment for Grounded Multimodal Named Entity Recognition. 795-803 - Jian Yang

, Weize Quan
, Zhen Shen
, Dong-Ming Yan
, Huaiyu Wu
:
Neural Parametric Human Hand Modeling with Point Cloud Representation. 804-813 - Yi Li

, Qingmeng Zhu
, Changwen Zheng
, Jiangmeng Li
:
MSI: Multi-modal Recommendation via Superfluous Semantics Discarding and Interaction Preserving. 814-823 - Chao He

, Hongxi Wei
:
HybridHash: Hybrid Convolutional and Self-Attention Deep Hashing for Image Retrieval. 824-832 - Lisong Ou

, Zhixin Li
:
Modeling Multi-Task Joint Training of Aggregate Networks for Multi-Modal Sarcasm Detection. 833-841 - Ziyu Gong

, Chengcheng Mai
, Yihua Huang
:
ML2MG-VLCR: A Multimodal LLM Guided Zero-shot Method for Visio-linguistic Compositional Reasoning with Autoregressive Generative Language Model. 842-850 - Ziqing Deng

, Zhihui Lai
, Yujuan Ding
, Heng Kong
, Xu Wu
:
Deep Scaling Factor Quantization Network for Large-scale Image Retrieval. 851-859 - Yan Wang

, Yawen Zeng
, Junjie Liang
, Xiaofen Xing
, Jin Xu
, Xiangmin Xu
:
RetrievalMMT: Retrieval-Constrained Multi-Modal Prompt Learning for Multi-Modal Machine Translation. 860-868 - Runlai Hao

, Jinlong Li
, Qiuju Chen
, Huanhuan Chen
:
DualStyle3D: Real-time Exemplar-based Artistic Portrait View Synthesis Based on Radiance Field. 869-877 - Jiancheng Huang

, Mingfu Yan
, Yifan Liu
, Shifeng Chen
:
SBCR: Stochasticity Beats Content Restriction Problem in Training and Tuning Free Image Editing. 878-887 - Shenghao Liu

, Yuqin Lan
, Xianjun Deng
, Lingzhi Yi
, Chenlu Zhu
, Laurence T. Yang
, Jong Hyuk Park
:
TrustGo: Trust Mining and Multi-semantic Regularization in Social Recommendation. 888-896 - Beiqi Liu

, Fuqing Duan
, Junli Zhao
:
SkeletonFormer: Point Cloud Completion with Dynamic Selective Skeleton Points. 897-905 - Chen Huang

, Zhijun Fan
, Kui Xiao
, Yan Zhang
, Shihui Wang
, Jianhua Song
, Wei Wu
, Chao Liu
:
Research on Epilepsy Classification Model Based on Variational Mode Quadratic Decomposition. 906-914 - Xukun Zhou

, Zhenbo Song
, Jun He
, Hongyan Liu
, Zhaoxin Fan
:
STDG: Semi-Teacher-Student Training Paradigm for Depth-guided One-stage Scene Graph Generation. 915-924 - Anrui Wang

, Libo Weng
, Fei Gao
:
BFIDet: A YOLOv7-improved Vehicle and Pedestrian Detector via Balancing Feature Integration. 925-933 - Chun-Yen Chen

, Mei-Chen Yeh
:
Self-Supervised Multi-Label Classification with Global Context and Local Attention. 934-942 - Tianlong Zhang

, Jing Lv
, Ming Yang
:
Semi-Parametric Style Transfer with Multi-Perspective Feature Fusion and Information-Guided Alignment. 943-950 - Kontawat Wisetpaitoon

, Sattaya Singkul
, Theerat Sakdejayont
, Tawunrat Chalothorn
:
End-to-End Thai Text-to-Speech with Linguistic Unit. 951-959 - Linhao Zhou

, Sheng-Hua Zhong
, Zhijiao Xiao
:
Discovering Multi-Relational Integration for Knowledge Tracing with Retentive Networks. 960-968 - Qin Jiang

, Qinglin Wang
, Lihua Chi
, Wentao Ma
, Feng Li
, Jie Liu
:
DeepEnhancer: Temporally Consistent Focal Transformer for Comprehensive Video Enhancement. 969-977 - Hongyi Zhu

, Jia-Hong Huang
, Stevan Rudinac
, Evangelos Kanoulas
:
Enhancing Interactive Image Retrieval With Query Rewriting Using Large Language Models and Vision Language Models. 978-987 - Yitong Xing

, Guoqiang Xiao
, Michael S. Lew
, Song Wu
:
Lifelong Visible-Infrared Person Re-Identification via a Tri-Token Transformer with a Query-Key Mechanism. 988-997 - Wenzhuo Li

, Yinghui Wang
, Wei Li
, Liangyi Huang
, Kamoliddin Shukurov
, Mingfeng Wang
:
Wireless Capsule Endoscope Low-light Image Enhancement with Balanced Brightness and Saturation. 998-1005 - Sohail Ahmed Khan

, Duc-Tien Dang-Nguyen
:
CLIPping the Deception: Adapting Vision-Language Models for Universal Deepfake Detection. 1006-1015 - Boyue Xu

, Ruichao Hou
, Tongwei Ren
, Gangshan Wu
:
RGB-D Video Object Segmentation via Enhanced Multi-store Feature Memory. 1016-1024 - Yongkang Ding

, Anqi Wang
, Liyan Zhang
:
Multidimensional Semantic Disentanglement Network for Clothes-Changing Person Re-Identification. 1025-1033 - Yuting Mei

, Linli Yao
, Qin Jin
:
UBiSS: A Unified Framework for Bimodal Semantic Summarization of Videos. 1034-1042 - Ali Abdari

, Alex Falcon
, Giuseppe Serra
:
AdOCTeRA: Adaptive Optimization Constraints for improved Text-guided Retrieval of Apartments. 1043-1050 - Ruiting Dai

, Yuqiao Tan
, Lisi Mo
, Shuang Liang
, Guohao Huo
, Jiayi Luo
, Yao Cheng
:
G-SAP: Graph-based Structure-Aware Prompt Learning over Heterogeneous Knowledge for Commonsense Reasoning. 1051-1060 - Minyang Xu

, Yunzhong Lou
, Weijian Ma
, Xueyang Li
, Xiangdong Zhou
:
Parametric CAD Primitive Retrieval via Multi-Modal Fusion and Deep Hashing. 1061-1069 - Lai Wei

, Mingyuan Xi
:
Subspace Clustering with A Hybrid Adaptive Graph Filter. 1070-1078
Regular Short Papers
- Cencen Liu

, Dongyang Zhang
, Ke Qin
:
Knowledge Distillation for Single Image Super-Resolution via Contrastive Learning. 1079-1083 - Yuhang Zheng

, Zhen Wang
, Long Chen
:
Improving Data Augmentation for Robust Visual Question Answering with Effective Curriculum Learning. 1084-1088 - Shuyang Zhang

, Liangwu Wei
, Qingyu Wang
, Yuntao Wei
, Yanzhi Song
:
CLCP: Realtime Text-Image Retrieval for Retailing via Pre-trained Clustering and Priority Queue. 1089-1093 - Mengzhu Yu

, Zhenjun Tang
, Huijiang Zhuang
, Xiaoping Liang
, Zhixin Li
, Xianquan Zhang
:
Robust Video Hashing with Non-negative Tensor Factorization for Copy Detection. 1094-1098 - Yihua Chen

, Xiaoping Liang
, Mengzhu Yu
, Zhenjun Tang
:
Unifying Pictorial and Textual Features for Screen Content Image Quality Evaluation. 1099-1103 - Mingyong Li

, Zongwei Zhao
, Xiaolong Jiang
, Zheng Jiang
:
CLIP-ProbCR: CLIP-based Probability embedding Combination Retrieval. 1104-1109 - Peihao Li

, Jie Huang
, Shuaishuai Zhang
, Chunyang Qi
:
Proactive Privacy and Intellectual Property Protection of Multimedia Retrieval Models in Edge Intelligence. 1110-1114 - Ruonan Zhang

, Xiaohang Liu
, Ge Li
, Thomas H. Li
, Pengjun Zhao
:
Sketch-aided Interactive Fusion Point Cloud Place Recognition. 1115-1119 - Huxiao Ji

, Haitao Yang
, Linchuan Li
, Shunyu Zhang
, Cunyi Zhang
, Xuanping Li
, Wenwu Ou
:
TIM: Temporal Interaction Model in Notification System. 1120-1124 - Quan Li

, Xike Xie
, Chao Wang
, Jiali Weng
:
Local Deep Learning Quantization for Approximate Nearest Neighbor Search. 1125-1129 - Pengfei Zhou

, Fangxiang Feng
, Xiaojie Wang
:
DiffHarmony: Latent Diffusion Model Meets Image Harmonization. 1130-1134 - Haoran Tong

, Xinyan Liu
, Guorong Li
, Laiyun Qing
:
Directly Locating Actions in Video with Single Frame Annotation. 1135-1139 - Ruoxi Sun

, Xinyu Yang
, Cong Qian, Chenyu Zhu, Wei Sui
, Zeyd Boukhers, Cong Yang:
YawnNet: A Visual-Centric Approach for Yawning Detection. 1140-1144 - Eisaku Yoshikawa

, Keishi Tajima
:
Content-Based Exclusion Queries in Keyword-Based Image Retrieval. 1145-1149 - Zhikang Zhang

, Zhongjie Zhu
, Yongqiang Bai
, Ming Wang
, Zhijing Yu
:
Octree-Retention Fusion: A High-Performance Context Model for Point Cloud Geometry Compression. 1150-1154 - Zhuo Lei

, Qiang Yu
, Lidan Shou
, Shengquan Li
, Yunqing Mao
:
A GAN based Video Summarization Method with Representation Loss. 1155-1159 - Sherzod Hakimov

, Gullal S. Cheema
:
Unveiling Global Narratives: A Multilingual Twitter Dataset of News Media on the Russo-Ukrainian Conflict. 1160-1164 - Minh-Son Dao

, Koji Zettsu
:
Near-Miss Accident Prediction on the Edge: A Real-Time System for Safer Driving. 1165-1169 - Qinghua Sun

, Jia Cui
, Zhenyu Gu
:
Extending CLIP for Text-to-font Retrieval. 1170-1174 - Xitie Zhang

, Suping Wu
:
CLTalk: Speech-Driven 3D Facial Animation with Contrastive Learning. 1175-1179 - Chih-Pin Tan

, Shuen-Huei Guan
, Yi-Hsuan Yang
:
PiCoGen: Generate Piano Covers with a Two-stage Approach. 1180-1184 - Yueying Feng

, Fan Ma
, Wang Lin
, Chang Yao
, Jingyuan Chen
, Yi Yang:
FedPAM: Federated Personalized Augmentation Model for Text-to-Image Retrieval. 1185-1189
Brave New Ideas Papers
- Lorin Sweeney

, Graham Healy
, Alan F. Smeaton
:
Reconciling the Rift Between Recognition and Recall: Insights from a Video Memorability Drawing Experiment. 1190-1198 - Kai Uwe Barthel

, Florian Tim Barthel
, Peter Eisert
, Nico Hezel
, Konstantin Schall
:
Creating Sorted Grid Layouts with Gradient-based Optimization. 1199-1206 - Christian Limberg

, Zhe Zhang
:
Mapping the Audio Landscape for Innovative Music Sample Generation. 1207-1213
Doctoral Symposium Papers
- Jia-Hong Huang

:
Multi-modal Video Summarization. 1214-1218 - Maria Eirini Pegia

:
Multimodality in Media Retrieval. 1219-1223
Reproducibility Track Papers
- Shuiying Liao

, Yujuan Ding
, P. Y. Mok
, Qiushi Huang
, Jialun Cao
:
Reproducibility Companion Paper: Recommendation of Mix-and-Match Clothing by Modeling Indirect Personal Compatibility. 1224-1227 - Yankun Wu

, Yuta Nakashima
, Noa Garcia
, Sheng Li
, Zhaoyang Zeng
:
Reproducibility Companion Paper: Stable Diffusion for Content-Style Disentanglement in Art Analysis. 1228-1231 - Fan Yu

, Beibei Zhang
, Yaqun Fang
, Jia Bei
, Tongwei Ren
, Jiyi Li
, Luca Rossetto
:
Reproducibility Companion Paper of "MMSF: A Multimodal Sentiment-Fused Method to Recognize Video Speaking Style". 1232-1235
Technical Demonstrations
- Luca Rossetto

:
OpenLifelogCam - A Low-Cost Open-Source Wearable Camera Platform. 1236-1240 - Panumate Chetprayoon

, Sakol Tasanangam
, Gayatri Tirumalasetty
, Thanatwit Angsarawanee
, Paveen Virameteekul
, Wadeepas Lertwatanawanich
, Theerat Sakdejayont
:
CarAI: Car Inspection with Artificial Intelligence. 1241-1245 - Kuo-Yu Liu

, Ting-Yu Guo
, Ta-Shan Pan
, Ping-Yi Tung
, Yi-Rou Lin
:
AI Batting Buddy: A Computational and Kinematic Approach for Enhancing Batting Performance and Analysis in Baseball. 1246-1250 - Supatta Viriyavisuthisakul

, Parinya Sanguansat
, Toshihiko Yamasaki
:
A Web Demo Interface for Super-Resolution Reconstruction with Parametric Regularization Loss. 1251-1254 - Quang-Linh Tran

, Binh T. Nguyen
, Gareth J. F. Jones
, Cathal Gurrin
:
MemoriLens: a Low-cost Lifelog Camera Using Raspberry Pi Zero. 1255-1259 - Maria Eirini Pegia

, Dimitris Georgalis
, Nick Pantelidis
, Björn Þór Jónsson
, Anastasia Moumtzidou
, Sotiris Diplaris
, Ilias Gialampoukidis
, Stefanos Vrochidis
, Ioannis Kompatsiaris
:
3DMSE: An Interactive 3D Media Search Engine. 1260-1264 - Daniel D. Braghis

, Haiming Liu
:
Conversational Image Search: A Sketch-based Approach. 1265-1269 - Wang Xia

, Guodao Sun
, Zihao Zhu
, Pan Liang
, Sujia Zhu
, Yiming Wu
, Haoran Liang
, Ronghua Liang
:
RE-IDVIS: Person Re-Identification System based on Interactive Visualization. 1270-1274
Challenge Papers
- Duc-Tien Dang-Nguyen

, Sohail Ahmed Khan
, Michael Riegler
, Pål Halvorsen
, Anh-Duy Tran
, Minh-Son Dao
, Minh-Triet Tran
:
Overview of the Grand Challenge on Detecting Cheapfakes at ACM ICMR 2024. 1275-1281 - Hoa-Vien Vo-Hoang

, Long-Khanh Pham
, Minh-Son Dao
:
Detecting Out-of-Context Media with LLaMa-Adapter V2 and RoBERTa: An Effective Method for Cheapfakes Detection. 1282-1287 - Long-Khanh Pham

, Hoa-Vien Vo-Hoang
, Anh-Duy Tran
:
A Generative Adaptive Context Learning Framework for Large Language Models in Cheapfake Detection. 1288-1293 - Anh-Thu Le

, Minh-Dat Nguyen
, Minh-Son Dao
, Anh-Duy Tran
, Duc-Tien Dang-Nguyen
:
TeGA: A Text-Guided Generative-based Approach in Cheapfake Detection. 1294-1299 - Van-Loc Nguyen

, Bao-Tin Nguyen
, Thanh-Son Nguyen
, Duc-Tien Dang-Nguyen
, Minh-Triet Tran
:
A Unified Network for Detecting Out-Of-Context Information Using Generative Synthetic Data. 1300-1305 - Dang Vu

, Minh-Nhat Nguyen
, Quoc-Trung Nguyen
:
Enhancing Cheapfake Detection: An Approach Using Prompt Engineering and Interleaved Text-Image Model. 1306-1311 - Jangwon Seo

, Hyo-Seok Hwang
, Jiyoung Lee
, Minhyeok Lee
, Wonsuk Kim
, Junhee Seok
:
A Multi-Stage Deep Learning Approach Incorporating Text-Image and Image-Image Comparisons for Cheapfake Detection. 1312-1316
Invited Talks Abstracts
- Alan F. Smeaton

:
The LLM Wrecking Ball: Are We About to Lose Decades of Work in Multimedia because of MM-LLMs? 1317 - Yi-Ping Phoebe Chen

:
Diversity in Multimedia. 1318
Tutorial Abstracts
- Frank Sommers

, Alisa Kongthon
, Sarawoot Kongyoung
:
Fine-Tuning Large Language Models for Private Document Retrieval: A Tutorial. 1319-1320 - Vinh Dang

, Thanh-Son Nguyen
, Minh-Triet Tran
, Duc-Tien Dang-Nguyen
:
Detecting Misinformation in Photos Utilizing Reverse Image Search. 1321-1323 - Maria Pegia

, Sotiris Diplaris
, Stefanos Vrochidis
, Heiko Schuldt
, Florian Spiess
, Rahel Arnold
, Werner Bailer
:
Multimedia Retrieval in and for XR. 1324-1325 - Shiqi Wang

, Xinfeng Zhang
:
Compact Visual Data Representation for Multimedia Search and Analytics. 1326-1327
Workshop Abstracts
- Tai Tan Mai

, Quang-Linh Tran
, Ly-Duyen Tran
, Van-Tu Ninh
, Duc-Tien Dang-Nguyen
, Cathal Gurrin
:
The First ACM Workshop on AI-Powered Question Answering Systems for Multimedia. 1328-1329 - Mahasak Ketcham

, Kanyalag Phodong
, Patiyuth Pramkeaw
, Worawut Yimyam
, Narumol Chumuang
, Pokpong Songmuang
, Thittaporn Ganokratanaa
:
AI-SIPM 2024: International Workshop on Artificial Intelligence for Signal, Image Processing and Multimedia. 1330-1331 - Minh-Son Dao

, Michael Alexander Riegler
, Duc-Tien Dang-Nguyen
, Hanh-Nhi Tran
, Rage Uday Kiran
, Takahiro Komamizu
:
ICDAR 24: Intelligent Cross-Data Analysis and Retrieval. 1332-1333 - Cathal Gurrin

, Liting Zhou
, Graham Healy
, Werner Bailer
, Duc-Tien Dang-Nguyen
, Steve Hodges
, Björn Þór Jónsson
, Jakub Lokoc
, Luca Rossetto
, Minh-Triet Tran
, Klaus Schöffmann
:
Introduction to the Seventh Annual Lifelog Search Challenge, LSC'24. 1334-1335 - Zhedong Zheng

, Yaxiong Wang
, Xuelin Qian
, Zhun Zhong
, Zheng Wang
, Liang Zheng
:
MORE'24 Multimedia Object Re-ID: Advancements, Challenges, and Opportunities. 1336-1338 - Cristian Lucian Stanciu

, Bogdan Ionescu
, Luca Cuccovillo
, Symeon Papadopoulos
, Giorgos Kordopatis-Zilos
, Adrian Popescu
, Roberto Caldelli
:
MAD '24 Workshop: Multimedia AI against Disinformation. 1339-1341 - Marc A. Kastner

, Gullal S. Cheema
, Sherzod Hakimov
, Noa Garcia
:
MUWS 2024: The 3rd International Workshop on Multimodal Human Understanding for the Web and Social Media. 1342-1344 - Hui Wang

, Josef Kittler
, Mark J. F. Gales, Rob Cooper
, Maurice D. Mulvenna
, Wing W. Y. Ng
, Yang Hua
, Richard Gault
, Abbas Haider
, Guanfeng Wu
:
MVRMLM 2024: Multimodal Video Retrieval and Multimodal Language Modelling. 1345-1346 - Hongzhang Mu

, Shuili Zhang
, Hongbo Xu
:
A Knowledge-Driven Approach to Enhance Topic Modeling with Multi-Modal Representation Learning. 1347-1355

manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.


Google
Google Scholar
Semantic Scholar
Internet Archive Scholar
CiteSeerX
ORCID














