default search action
31st ACM Multimedia 2023: Ottawa, ON, Canada
- Abdulmotaleb El-Saddik, Tao Mei, Rita Cucchiara, Marco Bertini, Diana Patricia Tobon Vallejo, Pradeep K. Atrey, M. Shamim Hossain:
Proceedings of the 31st ACM International Conference on Multimedia, MM 2023, Ottawa, ON, Canada, 29 October 2023- 3 November 2023. ACM 2023
Keynote Talks
- Chang Wen Chen:
Internet of Video Things: Technical Challenges and Emerging Applications. 1-2 - Alejandro Jaimes:
Multimodal AI & LLMs for Peacekeeping and Emergency Response. 3-4 - Ralf Steinmetz:
Transition and Adaptability: The Cornerstone of Resilience in Future Networked Multimedia Systems and Beyond. 5-6
Oral Session I: Understanding Multimedia Content -- Media Interpretation
- Hao Shen, Zhong-Qiu Zhao, Yulun Zhang, Zhao Zhang:
Mutual Information-driven Triple Interaction Network for Efficient Image Dehazing. 7-16 - Yang Jiao, Zequn Jie, Jingjing Chen, Lin Ma, Yu-Gang Jiang:
Suspected Objects Matter: Rethinking Model's Prediction for One-stage Visual Grounding. 17-26 - Sophyani Banaamwini Yussif, Ning Xie, Yang Yang, Heng Tao Shen:
Self-Relational Graph Convolution Network for Skeleton-Based Action Recognition. 27-36 - Qian Ning, Fangfang Wu, Weisheng Dong, Xin Li, Guangming Shi:
Exploring Correlations in Degraded Spatial Identity Features for Blind Face Restoration. 37-45 - Chuhao Zhou, Jinxing Li, Huafeng Li, Guangming Lu, Yong Xu, Min Zhang:
Video-based Visible-Infrared Person Re-Identification via Style Disturbance Defense and Dual Interaction. 46-55 - Wenmiao Hu, Yichen Zhang, Yuxuan Liang, Xianjing Han, Yifang Yin, Hannes Kruppa, See-Kiong Ng, Roger Zimmermann:
PetalView: Fine-grained Location and Orientation Extraction of Street-view Images via Cross-view Local Search. 56-66 - Haorui Wang, Yibo Hu, Yangfu Zhu, Jinsheng Qi, Bin Wu:
Shifted GCN-GAT and Cumulative-Transformer based Social Relation Recognition for Long Videos. 67-76 - Jilong Wang, Saihui Hou, Yan Huang, Chunshui Cao, Xu Liu, Yongzhen Huang, Liang Wang:
Causal Intervention for Sparse-View Gait Recognition. 77-85 - Digbalay Bose, Rajat Hebbar, Tiantian Feng, Krishna Somandepalli, Anfeng Xu, Shrikanth Narayanan:
MM-AU: Towards Multimodal Understanding of Advertisement Videos. 86-95 - Huiwei Lin, Shanshan Feng, Baoquan Zhang, Hongliang Qiao, Xutao Li, Yunming Ye:
UER: A Heuristic Bias Addressing Approach for Online Continual Learning. 96-104 - Peng Wu, Xiankai Lu, Jianbing Shen, Yilong Yin:
Clip Fusion with Bi-level Optimization for Human Mesh Reconstruction from Monocular Videos. 105-115 - Jinkai Zheng, Xinchen Liu, Shuai Wang, Lihao Wang, Chenggang Yan, Wu Liu:
Parsing is All You Need for Accurate Gait Recognition in the Wild. 116-124 - Dingyi Zhang, Yingming Li, Zhongfei Zhang:
Multi-Scale Similarity Aggregation for Dynamic Metric Learning. 125-134 - Yue Feng, Zhengye Zhang, Rong Quan, Limin Wang, Jie Qin:
RefineTAD: Learning Proposal-free Refinement for Temporal Action Detection. 135-143 - Zhenguang Liu, Xinyang Yu, Ruili Wang, Shuai Ye, Zhe Ma, Jianfeng Dong, Sifeng He, Feng Qian, Xiaobo Zhang, Roger Zimmermann, Lei Yang:
Video Infringement Detection via Feature Disentanglement and Mutual Information Maximization. 144-152 - Dongbao Yang, Yu Zhou, Xiaopeng Hong, Aoting Zhang, Xin Wei, Linchengxi Zeng, Zhi Qiao, Weiping Wang:
Pseudo Object Replay and Mining for Incremental Object Detection. 153-162 - Shiqin Wang, Xin Xu, Xianzheng Ma, Kui Jiang, Zheng Wang:
Informative Classes Matter: Towards Unsupervised Domain Adaptive Nighttime Semantic Segmentation. 163-172 - Ye Tian, Mengyu Yang, Lanshan Zhang, Zhizhen Zhang, Yang Liu, Xiaohui Xie, Xirong Que, Wendong Wang:
View while Moving: Efficient Video Recognition in Long-untrimmed Videos. 173-183 - Yimin Deng, Huaizhen Tang, Xulong Zhang, Jianzong Wang, Ning Cheng, Jing Xiao:
PMVC: Data Augmentation-Based Prosody Modeling for Expressive Voice Conversion. 184-192 - Gege Shi, Xueyang Fu, Chengzhi Cao, Zheng-Jun Zha:
Alleviating Spatial Misalignment and Motion Interference for UAV-based Video Recognition. 193-202 - Yang Liu, Zhaoyang Xia, Mengyang Zhao, Donglai Wei, Yuzheng Wang, Siao Liu, Bobo Ju, Gaoyun Fang, Jing Liu, Liang Song:
Learning Causality-inspired Representation Consistency for Video Anomaly Detection. 203-212 - Dongyue Guo, Yi Lin, Xuehang You, Zhongping Yang, Jizhe Zhou, Bo Yang, Jianwei Zhang, Han Shi, Shasha Hu, Zheng Zhang:
M2ATS: A Real-world Multimodal Air Traffic Situation Benchmark Dataset and Beyond. 213-221 - Jianghu Lu, Shikun Li, Kexin Bao, Pengju Wang, Zhenxing Qian, Shiming Ge:
Federated Learning with Label-Masking Distillation. 222-232 - Lingxiao Lu, Jiangtong Li, Junyan Cao, Li Niu, Liqing Zhang:
Painterly Image Harmonization using Diffusion Model. 233-241 - Xingran Xie, Ting Jin, Boxiang Yun, Qingli Li, Yan Wang:
Exploring Hyperspectral Histopathology Image Segmentation from a Deformable Perspective. 242-251 - Runhua Jiang, Yahong Han:
Uncertainty-Aware Variate Decomposition for Self-supervised Blind Image Deblurring. 252-260
Oral Session II: Understanding Multimedia Content -- Multimodal Fusion and Embedding
- Chao Sun, Min Chen, Jialiang Cheng, Han Liang, Chuanbo Zhu, Jincai Chen:
SCLAV: Supervised Cross-modal Contrastive Learning for Audio-Visual Coding. 261-270 - Feng Lin, Kaiqiang Fu, Hao Luo, Ziyue Zhan, Zhibo Wang, Zhenguang Liu, Lorenzo Cavallaro, Kui Ren:
Cross-Modal and Multi-Attribute Face Recognition: A Benchmark. 271-279 - Ye Wang, Junyang Chen, Mengzhu Wang, Hao Li, Wei Wang, Houcheng Su, Zhihui Lai, Wei Wang, Zhenghan Chen:
A Closer Look at Classifier in Adversarial Domain Generalization. 280-289 - Mengzhu Wang, Jianlong Yuan, Zhibin Wang:
Mixture-of-Experts Learner for Single Long-Tailed Domain Generalization. 290-299 - Chao Zhang, Jingwen Wei, Bo Wang, Zechao Li, Chunlin Chen, Huaxiong Li:
Robust Spectral Embedding Completion Based Incomplete Multi-view Clustering. 300-308 - Jinhui Pang, Zixuan Wang, Jiliang Tang, Mingyan Xiao, Nan Yin:
SA-GDA: Spectral Augmentation for Graph Domain Adaptation. 309-318 - Xihong Yang, Cheng Tan, Yue Liu, Ke Liang, Siwei Wang, Sihang Zhou, Jun Xia, Stan Z. Li, Xinwang Liu, En Zhu:
CONVERT: Contrastive Graph Clustering with Reliable Augmentation. 319-327 - Jintian Ji, Songhe Feng:
High-order Complementarity Induced Fast Multi-View Clustering with Enhanced Tensor Rank Minimization. 328-336 - Xihong Yang, Jiaqi Jin, Siwei Wang, Ke Liang, Yue Liu, Yi Wen, Suyuan Liu, Sihang Zhou, Xinwang Liu, En Zhu:
DealMVC: Dual Contrastive Calibration for Multi-view Clustering. 337-346 - Junming Hou, Qi Cao, Ran Ran, Che Liu, Junling Li, Liang-Jian Deng:
Bidomain Modeling Paradigm for Pansharpening. 347-357 - Yingying Wang, Yunlong Lin, Ge Meng, Zhenqi Fu, Yuhang Dong, Linyu Fan, Hedeng Yu, Xinghao Ding, Yue Huang:
Learning High-frequency Feature Enhancement and Alignment for Pan-sharpening. 358-367 - Xingfeng Li, Yinghui Sun, Quansen Sun, Jia Dai, Zhenwen Ren:
Distribution Consistency based Fast Anchor Imputation for Incomplete Multi-view Clustering. 368-376 - Yushen Wei, Yang Liu, Hong Yan, Guanbin Li, Liang Lin:
Visual Causal Scene Refinement for Video Question Answering. 377-386 - Hongye Liu, Xianhai Xie, Yang Gao, Zhou Yu:
Parameter-Efficient Transfer Learning for Audio-Visual-Language Tasks. 387-396 - Xi Chen, Yun Xiong, Siqi Wang, Haofen Wang, Tao Sheng, Yao Zhang, Yu Ye:
ReCo: A Dataset for Residential Community Layout Planning. 397-405 - Runmin Cong, Hongyu Liu, Chen Zhang, Wei Zhang, Feng Zheng, Ran Song, Sam Kwong:
Point-aware Interaction and CNN-induced Refinement Network for RGB-D Salient Object Detection. 406-416 - Jinrong Cui, Yuting Li, Yulu Fu, Jie Wen:
Multi-view Self-Expressive Subspace Clustering Network. 417-425 - Jian Huang, Yanli Ji, Yang Yang, Heng Tao Shen:
Cross-modality Representation Interactive Learning for Multimodal Sentiment Analysis. 426-434 - Yixuan Ma, Xiaolin Zhang, Peng Zhang, Kun Zhan:
Entropy Neural Estimation for Graph Contrastive Learning. 435-443 - Liguo Zhang, Zilin Tian, Yunfei Long, Sizhao Li, Guisheng Yin:
Cross-modal and Cross-medium Adversarial Attack for Audio. 444-453 - Liang Peng, Xin Wang, Xiaofeng Zhu:
Unsupervised Multiplex Graph learning with Complementary and Consistent Information. 454-462 - Yixuan Wu, Jintai Chen, Jiahuan Yan, Yiheng Zhu, Danny Z. Chen, Jian Wu:
GCL: Gradient-Guided Contrastive Learning for Medical Image Segmentation with Multi-Perspective Meta Labels. 463-471 - Zhiying Jiang, Zengxi Zhang, Jinyuan Liu, Xin Fan, Risheng Liu:
Multi-Spectral Image Stitching via Spatial Graph Reasoning. 472-480 - Jiaming Zhuo, Can Cui, Kun Fu, Bingxin Niu, Dongxiao He, Yuanfang Guo, Zhen Wang, Chuan Wang, Xiaochun Cao, Liang Yang:
Propagation is All You Need: A New Framework for Representation Learning and Classifier Training on Graphs. 481-489 - Yao Wu, Mingwei Xing, Yachao Zhang, Yuan Xie, Jianping Fan, Zhongchao Shi, Yanyun Qu:
Cross-modal Unsupervised Domain Adaptation for 3D Semantic Segmentation via Bidirectional Fusion-then-Distillation. 490-498
Oral Session III: Understanding Multimedia Content -- Vision and Language
- Yinjie Zhao, Lichen Zhao, Qian Yu, Lu Sheng, Jing Zhang, Dong Xu:
Distortion-aware Transformer in 360° Salient Object Detection. 499-508 - Zixiao Wang, Hongtao Xie, Yuxin Wang, Jianjun Xu, Boqiang Zhang, Yongdong Zhang:
Symmetrical Linguistic Feature Distillation with CLIP for Scene Text Recognition. 509-518 - Bo Zou, Chao Yang, Chengbin Quan, Youjian Zhao:
SpaceCLIP: A Vision-Language Pretraining Framework With Spatial Reconstruction On Text. 519-528 - Xu Huang, Jin Liu, Zhizhong Zhang, Yuan Xie:
Improving Cross-Modal Recipe Retrieval with Component-Aware Prompted CLIP Embedding. 529-537 - Shuhan Kong, Liang Li, Beichen Zhang, Wenyu Wang, Bin Jiang, Chenggang Yan, Changhao Xu:
Dynamic Contrastive Learning with Pseudo-samples Intervention for Weakly Supervised Joint Video MR and HD. 538-546 - Zheng Yuan, Qiao Jin, Chuanqi Tan, Zhengyun Zhao, Hongyi Yuan, Fei Huang, Songfang Huang:
RAMM: Retrieval-augmented Biomedical Visual Question Answering with Multi-modal Pre-training. 547-556 - Xiao Wang, Yaoyu Li, Tian Gan, Zheng Zhang, Jingjing Lv, Liqiang Nie:
RTQ: Rethinking Video-language Understanding Based on Image-text Model. 557-566 - Shanshan Zhong, Zhongzhan Huang, Wushao Wen, Jinghui Qin, Liang Lin:
SUR-adapter: Enhancing Text-to-Image Pre-trained Diffusion Models with Large Language Models. 567-578 - Xin Dong, Rui Wang, Siyuan Liang, Aishan Liu, Lihua Jing:
Face Encryption via Frequency-Restricted Identity-Agnostic Attacks. 579-588 - Peipei Song, Dan Guo, Xun Yang, Shengeng Tang, Erkun Yang, Meng Wang:
Emotion-Prior Awareness Network for Emotional Video Captioning. 589-600 - Dong Liu, Qirong Mao, Lijian Gao, Qinghua Ren, Zhenghan Chen, Ming Dong:
TE-KWS: Text-Informed Speech Enhancement for Noise-Robust Keyword Spotting. 601-610 - Jiancheng Pan, Qing Ma, Cong Bai:
A Prior Instruction Representation Framework for Remote Sensing Image-text Retrieval. 611-620 - Nirmalendu Prakash, Han Wang, Nguyen-Khoi Hoang, Ming Shan Hee, Roy Ka-Wei Lee:
PromptMTopic: Unsupervised Multimodal Topic Modeling of Memes using Large Language Models. 621-631 - Yue Lv, Jinxi Xiang, Jun Zhang, Wenming Yang, Xiao Han, Wei Yang:
Dynamic Low-Rank Instance Adaptation for Universal Neural Image Compression. 632-642 - Leigang Qu, Shengqiong Wu, Hao Fei, Liqiang Nie, Tat-Seng Chua:
LayoutLLM-T2I: Eliciting Layout Guidance from LLM for Text-to-Image Generation. 643-654 - Yue Zhang, Suchen Wang, Shichao Kan, Zhenyu Weng, Yigang Cen, Yap-Peng Tan:
POAR: Towards Open Vocabulary Pedestrian Attribute Recognition. 655-665 - Shengshan Hu, Wei Liu, Minghui Li, Yechao Zhang, Xiaogeng Liu, Xianlong Wang, Leo Yu Zhang, Junhui Hou:
PointCRT: Detecting Backdoor in 3D Point Cloud via Corruption Robustness. 666-675 - Rui Qin, Ming Sun, Fangyuan Zhang, Xing Wen, Bin Wang:
Blind Image Super-resolution with Rich Texture-Aware Codebook. 676-687 - Zizhang Wu, Zhuozheng Li, Zhi-Gang Fan, Yunzhe Wu, Jian Pu, Xianzhi Li:
V2Depth: Monocular Depth Estimation via Feature-Level Virtual-View Simulation and Refinement. 688-697 - Kai Chen, Zhipeng Wei, Jingjing Chen, Zuxuan Wu, Yu-Gang Jiang:
GCMA: Generative Cross-Modal Transferable Adversarial Attacks from Images to Videos. 698-708 - Lianyu Hu, Liqing Gao, Zekang Liu, Chi-Man Pun, Wei Feng:
AdaBrowse: Adaptive Video Browser for Efficient Continuous Sign Language Recognition. 709-718 - Lingfeng Li, Gangming Zhao, Yizhou Yu, Jinpeng Li:
Dynamic Triple Reweighting Network for Automatic Femoral Head Necrosis Diagnosis from Computed Tomography. 719-727 - Liu Liu, Jianming Du, Hao Wu, Xun Yang, Zhenguang Liu, Richang Hong, Meng Wang:
Category-Level Articulated Object 9D Pose Estimation via Reinforcement Learning. 728-736 - Qichao Ying, Jiaxin Liu, Sheng Li, Haisheng Xu, Zhenxing Qian, Xinpeng Zhang:
RetouchingFFHQ: A Large-scale Dataset for Fine-grained Face Retouching Detection. 737-746 - Xueyi Zhang, Chengwei Zhang, Tao Wang, Jun Tang, Songyang Lao, Haizhou Li:
Slow-Fast Time Parameter Aggregation Network for Class-Incremental Lip Reading. 747-756 - Yang Bai, Jingyao Wang, Min Cao, Chen Chen, Ziqiang Cao, Liqiang Nie, Min Zhang:
Text-based Person Search without Parallel Image-Text Data. 757-767 - Jiawei Liang, Siyuan Liang, Aishan Liu, Ke Ma, Jingzhi Li, Xiaochun Cao:
Exploring Inconsistent Knowledge Distillation for Object Detection with Data Augmentation. 768-778 - Sun'ao Liu, Yiheng Zhang, Zhaofan Qiu, Hongtao Xie, Yongdong Zhang, Ting Yao:
CARIS: Context-Aware Referring Image Segmentation. 779-788 - Shizhou Zhang, Qingchun Yang, De Cheng, Yinghui Xing, Guoqiang Liang, Peng Wang, Yanning Zhang:
Ground-to-Aerial Person Search: Benchmark Dataset and Approach. 789-799 - Fan Jiang, Zilei Wang:
Sparse Sharing Relation Network for Panoptic Driving Perception. 800-808
Oral Session IV: Engaging Users with Multimedia -- Emotional and Social Signals
- Daoming Zong, Chaoyue Ding, Baoxiang Li, Jiakui Li, Ken Zheng, Qunyan Zhou:
AcFormer: An Aligned and Compact Transformer for Multimodal Sentiment Analysis. 833-842 - Zeng Tao, Yan Wang, Zhaoyu Chen, Boyang Wang, Shaoqi Yan, Kaixun Jiang, Shuyong Gao, Wenqiang Zhang:
Freq-HD: An Interpretable Frequency-based High-Dynamics Affective Clip Selection Method for in-the-Wild Facial Expression Recognition in Videos. 843-852 - Peiguang Jing, Xianyi Liu, Ji Wang, Yinwei Wei, Liqiang Nie, Yuting Su:
StyleEDL: Style-Guided High-order Attention Network for Image Emotion Distribution Learning. 853-861 - Junjie Zhu, Bingjun Luo, Ao Sun, Jinghang Tan, Xibin Zhao, Yue Gao:
Variance-Aware Bi-Attention Expression Transformer for Open-Set Facial Expression Recognition in the Wild. 862-870 - Zixin Zhang, Fan Qi, Shuai Li, Changsheng Xu:
AffectFAL: Federated Active Affective Computing with Non-IID Data. 871-882 - Peiliang Gong, Ziyu Jia, Pengpai Wang, Yueying Zhou, Daoqiang Zhang:
ASTDF-Net: Attention-Based Spatial-Temporal Dual-Stream Fusion Network for EEG-Based Emotion Recognition. 883-892
Oral Session V: Engaging Users with Multimedia -- Multimedia Search and Recommendation
- Yishu Liu, Qingpeng Wu, Zheng Zhang, Jingyi Zhang, Guangming Lu:
Multi-Granularity Interactive Transformer Hashing for Cross-modal Retrieval. 893-902 - Wenjie Wang, Xinyu Lin, Liuhui Wang, Fuli Feng, Yinwei Wei, Tat-Seng Chua:
Equivariant Learning for Out-of-Distribution Cold-start Recommendation. 903-914 - Haokun Wen, Xian Zhang, Xuemeng Song, Yinwei Wei, Liqiang Nie:
Target-Guided Composed Image Retrieval. 915-923 - Haoxuan Li, Yi Bin, Junrong Liao, Yang Yang, Heng Tao Shen:
Your Negative May not Be True Negative: Boosting Image-Text Matching with False Negative Elimination. 924-934 - Xin Zhou, Zhiqi Shen:
A Tale of Two Graphs: Freezing and Denoising Graph Structures for Multimodal Recommendation. 935-943 - Guiwei Zhang, Yongfei Zhang, Zichang Tan:
ProtoHPE: Prototype-guided High-frequency Patch Enhancement for Visible-Infrared Person Re-identification. 944-954 - Wei Ji, Xiangyan Liu, An Zhang, Yinwei Wei, Yongxin Ni, Xiang Wang:
Online Distillation-enhanced Multi-modal Transformer for Sequential Recommendation. 955-965 - Junyang Chen, Jialong Wang, Zhijiang Dai, Huisi Wu, Mengzhu Wang, Qin Zhang, Huan Wang:
Zero-shot Micro-video Classification with Neural Variational Inference in Graph Prototype Network. 966-974 - Zhiguo Chen, Xun Jiang, Xing Xu, Zuo Cao, Yijun Mo, Heng Tao Shen:
Joint Searching and Grounding: Multi-Granularity Video Content Retrieval. 975-983 - Yuyuan Li, Chaochao Chen, Xiaolin Zheng, Yizhao Zhang, Zhongxuan Han, Dan Meng, Jun Wang:
Making Users Indistinguishable: Attribute-wise Unlearning in Recommender Systems. 984-994 - Dugang Liu, Yang Qiao, Xing Tang, Liang Chen, Xiuqiang He, Zhong Ming:
Prior-Guided Accuracy-Bias Tradeoff Learning for CTR Prediction in Multimedia Recommendation. 995-1003 - Haoyue Bai, Min Hou, Le Wu, Yonghui Yang, Kun Zhang, Richang Hong, Meng Wang:
GoRec: A Generative Cold-start Recommendation Framework. 1004-1012 - Jingzhi Li, Fengling Li, Lei Zhu, Hui Cui, Jingjing Li:
Prototype-guided Knowledge Transfer for Federated Unsupervised Cross-modal Hashing. 1013-1022
Oral Session VI: Engaging Users with Multimedia -- Interactions and Quality of Experience
- Shuai He, Anlong Ming, Shuntian Zheng, Haobin Zhong, Huadong Ma:
EAT: An Enhancer for Aesthetics-Oriented Transformers. 1023-1032 - Sicheng Yang, Zilin Wang, Zhiyong Wu, Minglei Li, Zhensong Zhang, Qiaochu Huang, Lei Hao, Songcen Xu, Xiaofei Wu, Changpeng Yang, Zonghong Dai:
UnifiedGesture: A Unified Gesture Synthesis Model for Multiple Skeletons. 1033-1044