"Siamese Vision Transformers are Scalable Audio-Visual Learners."

Yan-Bo Lin, Gedas Bertasius (2024)

Details and statistics

DOI: 10.1007/978-3-031-72630-9_18

access: closed

type: Conference or Workshop Paper

metadata version: 2025-01-03