


Остановите войну!
for scientists:


default search action
17th ECCV 2022: Tel Aviv, Israel - Volume 36
- Shai Avidan, Gabriel J. Brostow
, Moustapha Cissé, Giovanni Maria Farinella
, Tal Hassner
:
Computer Vision - ECCV 2022 - 17th European Conference, Tel Aviv, Israel, October 23-27, 2022, Proceedings, Part XXXVI. Lecture Notes in Computer Science 13696, Springer 2022, ISBN 978-3-031-20058-8 - Benedikt Boecking
, Naoto Usuyama
, Shruthi Bannur
, Daniel C. Castro
, Anton Schwaighofer
, Stephanie L. Hyland
, Maria Wetscherek, Tristan Naumann
, Aditya V. Nori, Javier Alvarez-Valle
, Hoifung Poon
, Ozan Oktay
:
Making the Most of Text Semantics to Improve Biomedical Vision-Language Processing. 1-21 - Shipeng Yan, Lanqing Hong, Hang Xu, Jianhua Han, Tinne Tuytelaars
, Zhenguo Li, Xuming He:
Generative Negative Text Replay for Continual Vision-Language Pretraining. 22-38 - Junbin Xiao, Pan Zhou, Tat-Seng Chua, Shuicheng Yan:
Video Graph Transformer for Video Question Answering. 39-58 - Kun Yan
, Lei Ji, Chenfei Wu, Jianmin Bao, Ming Zhou, Nan Duan
, Shuai Ma:
Trace Controlled Text to Image Generation. 59-75 - A. J. Piergiovanni, Kairo Morton, Weicheng Kuo, Michael S. Ryoo, Anelia Angelova:
Video Question Answering with Iterative Video-Text Co-tokenization. 76-94 - Long Chen, Yuhang Zheng, Jun Xiao:
Rethinking Data Augmentation for Robust Visual Question Answering. 95-112 - Zhen Wang, Long Chen, Wenbo Ma, Guangxing Han, Yulei Niu, Jian Shao, Jun Xiao:
Explicit Image Caption Editing. 113-129 - Jiachang Hao
, Haifeng Sun, Pengfei Ren, Jingyu Wang, Qi Qi, Jianxin Liao:
Can Shuffling Video Benefit Temporal Bias Problem: A Novel Training Framework for Temporal Grounding. 130-147 - Spencer Whitehead, Suzanne Petryk, Vedaad Shakib, Joseph Gonzalez, Trevor Darrell, Anna Rohrbach, Marcus Rohrbach:
Reliable Visual Question Answering: Abstain Rather Than Answer Incorrectly. 148-166 - Van-Quang Nguyen, Masanori Suganuma, Takayuki Okatani:
GRIT: Faster and Better Image Captioning Transformer Using Dual Visual Features. 167-184 - Sunjae Yoon
, Ji Woo Hong
, Eunseop Yoon
, Dahyun Kim
, Junyeong Kim
, Hee Suk Yoon
, Chang D. Yoo
:
Selective Query-Guided Debiasing for Video Corpus Moment Retrieval. 185-200 - Cheng Shi, Sibei Yang:
Spatial and Visual Perspective-Taking via View Rotation and Relation Reasoning for Embodied Reference Understanding. 201-218 - Zihang Meng, David Yang, Xuefei Cao, Ashish Shah, Ser-Nam Lim:
Object-Centric Unsupervised Image Captioning. 219-235 - Quan Cui, Boyan Zhou, Yu Guo, Weidong Yin, Hao Wu, Osamu Yoshie, Yubo Chen:
Contrastive Vision-Language Pre-training with Limited Resources. 236-253 - Sheng Fang
, Shuhui Wang
, Junbao Zhuo
, Xinzhe Han
, Qingming Huang
:
Learning Linguistic Association Towards Efficient Text-Video Retrieval. 254-270 - Zanming Huang, Zhongkai Shangguan, Jimuyang Zhang, Gilad Bar, Matthew Boyd, Eshed Ohn-Bar:
ASSISTER: Assistive Navigation via Conditional Instruction Generation. 271-289 - Zhaowei Cai
, Gukyeong Kwon
, Avinash Ravichandran, Erhan Bas
, Zhuowen Tu, Rahul Bhotika, Stefano Soatto:
X-DETR: A Versatile Architecture for Instance-wise Vision-Language Tasks. 290-308 - Wenhao Cheng
, Xingping Dong
, Salman H. Khan, Jianbing Shen
:
Learning Disentanglement with Decoupled Labels for Vision-Language Navigation. 309-329 - Qingpei Guo, Kaisheng Yao, Wei Chu:
Switch-BERT: Learning to Model Multimodal Interactions by Switching Attention and Input. 330-346 - Bowen Li
:
Word-Level Fine-Grained Story Visualization. 347-362 - Qi Zhang
, Yuqing Song, Qin Jin
:
Unifying Event Detection and Captioning as Sequence Generation via Pre-training. 363-379 - Chuang Lin
, Yi Jiang
, Jianfei Cai
, Lizhen Qu
, Gholamreza Haffari
, Zehuan Yuan
:
Multimodal Transformer with Variable-Length Memory for Vision-and-Language Navigation. 380-397 - Christopher Thomas, Yipeng Zhang, Shih-Fu Chang:
Fine-Grained Visual Entailment. 398-416 - Ayush Jain, Nikolaos Gkanatsios, Ishita Mediratta, Katerina Fragkiadaki:
Bottom Up Top Down Detection Transformers for Language Grounding in Images and Point Clouds. 417-433 - Yifeng Zhang
, Ming Jiang
, Qi Zhao
:
New Datasets and Models for Contextual Reasoning in Visual Dialog. 434-451 - Joanna Hong
, Minsu Kim
, Yong Man Ro
:
VisageSynTalk: Unseen Speaker Video-to-Speech Synthesis via Speech-Visage Feature Selection. 452-468 - Matan Levy, Rami Ben-Ari, Dani Lischinski:
Classification-Regression for Chart Comprehension. 469-484 - Benita Wong, Joya Chen
, You Wu, Stan Weixian Lei, Dongxing Mao, Difei Gao, Mike Zheng Shou:
AssistQ: Affordance-Centric Question-Driven Task Completion for Egocentric Assistant. 485-501 - Weicheng Kuo, Fred Bertsch, Wei Li, A. J. Piergiovanni, Mohammad Saffar, Anelia Angelova:
FindIt: Generalized Localization with Natural Language Queries. 502-520 - Zhengyuan Yang, Zhe Gan, Jianfeng Wang, Xiaowei Hu, Faisal Ahmed, Zicheng Liu, Yumao Lu, Lijuan Wang:
UniTAB: Unifying Text and Box Outputs for Grounded Vision-Language Modeling. 521-539 - Golnaz Ghiasi, Xiuye Gu, Yin Cui, Tsung-Yi Lin:
Scaling Open-Vocabulary Image Segmentation with Image-Level Labels. 540-557 - Jack Hessel
, Jena D. Hwang
, Jae Sung Park
, Rowan Zellers
, Chandra Bhagavatula
, Anna Rohrbach
, Kate Saenko
, Yejin Choi
:
The Abduction of Sherlock Holmes: A Dataset for Visual Abductive Reasoning. 558-575 - Minsu Kim
, Hyunjun Kim
, Yong Man Ro
:
Speaker-Adaptive Lip Reading with User-Dependent Padding. 576-593 - Tan M. Dinh, Rang Nguyen, Binh-Son Hua:
TISE: Bag of Metrics for Text-to-Image Synthesis Evaluation. 594-609 - Morgan Heisler, Amin Banitalebi-Dehkordi, Yong Zhang:
SemAug: Semantically Meaningful Image Augmentations for Object Detection Through Language Grounding. 610-626 - Myungsub Choi
:
Referring Object Manipulation of Natural Images with Conditional Classifier-Free Guidance. 627-643 - Reuben Tan, Bryan A. Plummer, Kate Saenko, J. P. Lewis, Avneesh Sud, Thomas Leung:
NewsStories: Illustrating Articles with Visual Summaries. 644-661 - Amita Kamath, Christopher Clark, Tanmay Gupta, Eric Kolve, Derek Hoiem, Aniruddha Kembhavi:
Webly Supervised Concept Expansion for General Purpose Vision Models. 662-681 - Kaiwen Zhou, Xin Eric Wang:
FedVLN: Privacy-Preserving Federated Vision-and-Language Navigation. 682-699 - Haoran Wang, Dongliang He, Wenhao Wu, Boyang Xia, Min Yang, Fu Li, Yunlong Yu, Zhong Ji, Errui Ding, Jingdong Wang:
CODER: Coupled Diversity-Sensitive Momentum Contrastive Learning for Image-Text Retrieval. 700-716 - Tsu-Jui Fu, Xin Eric Wang, William Yang Wang:
Language-Driven Artistic Style Transfer. 717-734 - Zaid Khan
, B. G. Vijay Kumar
, Xiang Yu
, Samuel Schulter
, Manmohan Chandraker
, Yun Fu
:
Single-Stream Multi-level Alignment for Vision-Language Pretraining. 735-751

manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.