Stop the war!
Остановите войну!
for scientists:
default search action
IEEE/ACM Transactions on Audio, Speech and Language Processing, Volume 32
Volume 32, 2024
- Jin Chu Wu, Raghu N. Kacker:
Statistical Analysis for Speaker Recognition Evaluation With Data Dependence and Three Score Distributions. 1-14 - Yongwei Zhou, Junwei Bao, Youzheng Wu, Xiaodong He, Tiejun Zhao:
Operation-Augmented Numerical Reasoning for Question Answering. 15-28 - Anurenjan Purushothaman, Debottam Dutta, Rohit Kumar, Sriram Ganapathy:
Speech Dereverberation With Frequency Domain Autoregressive Modeling. 29-38 - Leyuan Qu, Taihao Li, Cornelius Weber, Theresa Pekarek-Rosin, Fuji Ren, Stefan Wermter:
Disentangling Prosody Representations With Unsupervised Speech Reconstruction. 39-54 - Mathias Bach Pedersen, Søren Holdt Jensen, Zheng-Hua Tan, Jesper Jensen:
Data-Driven Non-Intrusive Speech Intelligibility Prediction Using Speech Presence Probability. 55-67 - Yuanbo Hou, Bo Kang, Andrew Mitchell, Wenwu Wang, Jian Kang, Dick Botteldooren:
Cooperative Scene-Event Modelling for Acoustic Scene Classification. 68-82 - Xiaotong Jiang, Peiwen You, Chen Chen, Zhongqing Wang, Guodong Zhou:
Exploring Scope Detection for Aspect-Based Sentiment Analysis. 83-94 - Xuenan Xu, Zeyu Xie, Mengyue Wu, Kai Yu:
Beyond the Status Quo: A Contemporary Survey of Advances and Challenges in Audio Captioning. 95-112 - Federico Miotello, Mirco Pezzoli, Luca Comanducci, Fabio Antonacci, Augusto Sarti:
Deep Prior-Based Audio Inpainting Using Multi-Resolution Harmonic Convolutional Neural Networks. 113-123 - Cristian Lucian Stanciu, Jacob Benesty, Constantin Paleologu, Ruxandra-Liana Costea, Laura-Maria Dogariu, Silviu Ciochina:
Decomposition-Based Wiener Filter Using the Kronecker Product and Conjugate Gradient Method. 124-138 - Huiyao Chen, Yueheng Sun, Meishan Zhang, Min Zhang:
Automatic Noise Generation and Reduction for Text Classification. 139-150 - Jiaming Xu, Jian Cui, Yunzhe Hao, Bo Xu:
Multi-Cue Guided Semi-Supervised Learning Toward Target Speaker Separation in Real Environments. 151-163 - Yang Xiang, Jesper Lisby Højvang, Morten Højfeldt Rasmussen, Mads Græsbøll Christensen:
A Two-Stage Deep Representation Learning-Based Speech Enhancement Method Using Variational Autoencoder and Adversarial Training. 164-177 - Xiao Li, Ruirui Liu, Huichou Huang, Qingyao Wu:
Contrastive Learning for Target Speaker Extraction With Attention-Based Fusion. 178-188 - Xiaobo Liang, Runze Mao, Lijun Wu, Juntao Li, Min Zhang, Qing Li:
Enhancing Low-Resource NLP by Consistency Training With Data and Model Perturbations. 189-199 - Haisheng Lu, Jiangnan Liang, Chuang Shi:
Comments on "Primary-Ambient Extraction Using Ambient Spectrum Estimation for Immersive Spatial Audio Reproduction". 200-202 - Szymon Drgas, Lars Bramsløw, Archontis Politis, Gaurav Naithani, Tuomas Virtanen:
Dynamic Processing Neural Network Architecture for Hearing Loss Compensation. 203-214 - Femke B. Gelderblom, Tron V. Tronstad, Torbjørn Svendsen, Tor André Myrvoll:
On the Predictive Power of Objective Intelligibility Metrics for the Subjective Performance of Deep Complex Convolutional Recurrent Speech Enhancement Networks. 215-226 - Thomas Haubner, Andreas Brendel, Walter Kellermann:
End-to-End Deep Learning-Based Adaptation Control for Linear Acoustic Echo Cancellation. 227-238 - Congcong Jiang, Tieyun Qian, Bing Liu:
One General Teacher for Multi-Data Multi-Task: A New Knowledge Distillation Framework for Discourse Relation Analysis. 239-249 - Khandokar Md. Nayem, Donald S. Williamson:
Attention-Based Speech Enhancement Using Human Quality Perception Modeling. 250-260 - Ying Zhang, Fandong Meng, Yufeng Chen, Jinan Xu, Jie Zhou:
Complex Question Enhanced Transfer Learning for Zero-Shot Joint Information Extraction. 261-275 - Jingsong Yan, Piji Li, Haibin Chen, Junhao Zheng, Qianli Ma:
Does the Order Matter? A Random Generative Way to Learn Label Hierarchy for Hierarchical Text Classification. 276-285 - Georgios Paraskevopoulos, Theodoros Kouzelis, Georgios Rouvalis, Athanasios Katsamanis, Vassilis Katsouros, Alexandros Potamianos:
Sample-Efficient Unsupervised Domain Adaptation of Speech Recognition Systems: A Case Study for Modern Greek. 286-299 - Ernesto Accolti, Javier Gimenez, Michael Vorländer:
Uncertainties of Room Acoustics Simulation Due to Directivity Data of Musical Instruments. 300-309 - Yoshiki Masuyama, Kouei Yamaoka, Yuma Kinoshita, Taishi Nakashima, Nobutaka Ono:
Causal and Relaxed-Distortionless Response Beamforming for Online Target Source Extraction. 310-324 - Rohit Prabhavalkar, Takaaki Hori, Tara N. Sainath, Ralf Schlüter, Shinji Watanabe:
End-to-End Speech Recognition: A Survey. 325-351 - Yun Zhao, Dexi Liu, Changxuan Wan, Xiping Liu, Jian-Yun Nie, Jiaming Liu:
JMS-QA: A Joint Hierarchical Architecture for Mental Health Question Answering. 352-363 - Shiwen Ni, Jiawen Li, Min Yang, Hung-Yu Kao:
DropAttack: A Random Dropped Weight Attack Adversarial Training for Natural Language Understanding. 364-373 - Tiantian Zhu, Yang Qin, Ming Feng, Qingcai Chen, Baotian Hu, Yang Xiang:
BioPRO: Context-Infused Prompt Learning for Biomedical Entity Linking. 374-385 - Jiapu Wang, Boyue Wang, Junbin Gao, Simin Hu, Yongli Hu, Baocai Yin:
Multi-Level Interaction Based Knowledge Graph Completion. 386-396 - Qiangqiang Zhang, Dongyuan Lin, Yingying Xiao, Yunfei Zheng, Shiyuan Wang:
Error Reused Filtered-X Least Mean Square Algorithm for Active Noise Control. 397-412 - Zengrui Jin, Mengzhe Geng, Jiajun Deng, Tianzi Wang, Shujie Hu, Guinan Li, Xunying Liu:
Personalized Adversarial Data Augmentation for Dysarthric and Elderly Speech Recognition. 413-429 - Jun Kong, Jin Wang, Xuejie Zhang:
Adaptive Ensemble Self-Distillation With Consistent Gradients for Fast Inference of Pretrained Language Models. 430-442 - Srdan Kitic, Jérôme Daniel:
Blind Identification of Ambisonic Reduced Room Impulse Response. 443-458 - Qijie Shao, Pengcheng Guo, Jinghao Yan, Pengfei Hu, Lei Xie:
Decoupling and Interacting Multi-Task Learning Network for Joint Speech and Accent Recognition. 459-470 - Han Zhu, Gaofeng Cheng, Jindong Wang, Wenxin Hou, Pengyuan Zhang, Yonghong Yan:
Boosting Cross-Domain Speech Recognition With Self-Supervision. 471-485 - Yile Wang, Yue Zhang, Peng Li, Yang Liu:
Gradual Syntactic Label Replacement for Language Model Pre-Training. 486-496 - Penghui Ma, Jianfeng Li, Jingjing Pan, Xiaofei Zhang, Roberto Gil-Pita:
Coherent Signal DOA Estimation With Coprime Array: Exploiting Signal Subspace Reconstructing Strategy. 497-508 - Emma Hamel, Nickvash Kani:
Factors That Influence Automatic Recognition of African-American Vernacular English in Machine-Learning Models. 509-516 - Jingbei Li, Sipan Li, Ping Chen, Luwen Zhang, Yi Meng, Zhiyong Wu, Helen Meng, Qiao Tian, Yuping Wang, Yuxuan Wang:
Joint Multiscale Cross-Lingual Speaking Style Transfer With Bidirectional Attention Mechanism for Automatic Dubbing. 517-528 - Bing Han, Zhengyang Chen, Yanmin Qian:
Self-Supervised Learning With Cluster-Aware-DINO for High-Performance Robust Speaker Verification. 529-541 - Kristina Tesch, Timo Gerkmann:
Multi-Channel Speech Separation Using Spatially Selective Deep Non-Linear Filters. 542-553 - Hao-Chen Pei, Hao Fang, Xin Luo, Xin-Shun Xu:
Gradformer: A Framework for Multi-Aspect Multi-Granularity Pronunciation Assessment. 554-563 - Garima Sharma, Karthikeyan Umapathy, Sridhar Krishnan:
Time-Frequency Scattergrams for Biomedical Audio Signal Representation and Classification. 564-576 - Zhibo Man, Zengcheng Huang, Yujie Zhang, Yu Li, Yuanmeng Chen, Yufeng Chen, Jinan Xu:
WDSRL: Multi-Domain Neural Machine Translation With Word-Level Domain-Sensitive Representation Learning. 577-590 - Chin-Po Chen, Ho-Hsien Pan, Susan Shur-Fen Gau, Chi-Chun Lee:
Using Measures of Vowel Space for Autistic Traits Characterization. 591-607 - Kevin Wilkinghoff, Frank Kurth:
Why Do Angular Margin Losses Work Well for Semi-Supervised Anomalous Sound Detection? 608-622 - Aku Rouhe, Tamás Grósz, Mikko Kurimo:
Principled Comparisons for End-to-End Speech Recognition: Attention vs Hybrid at the 1000-Hour Scale. 623-638 - Yile Wang, Yue Zhang:
Lost in Context? On the Sense-Wise Variance of Contextualized Word Embeddings. 639-650 - Christoph Hold, Ville Pulkki, Archontis Politis, Leo McCormack:
Compression of Higher-Order Ambisonic Signals Using Directional Audio Coding. 651-665 - Shouhui Wang, Biao Qin:
A Novel Joint Training Model for Knowledge Base Question Answering. 666-679 - Songbin Li, Jingang Wang, Peng Liu, Ke Shi:
SANet: A Compressed Speech Encoder and Steganography Algorithm Independent Steganalysis Deep Neural Network. 680-690 - Tarek Kanan, Amani AbedAlghafer, Shadi AlZu'bi, Bilal Hawashin, Ala Mughaid, Ghassan Kanaan, M. M. Kamruzzaman:
An Intelligent Health Care System for Detecting Drug Abuse in Social Media Platforms Based on Low Resource Language. 691-703 - Alejandro Santorum Varela, Svetlana Stoyanchev, Simon Keizer, Rama Doddipatla, Kate M. Knill:
Entity Resolution in Situated Dialog With Unimodal and Multimodal Transformers. 704-713 - Huang He, Hua Lu, Siqi Bao, Fan Wang, Hua Wu, Zheng-Yu Niu, Haifeng Wang:
Learning to Select External Knowledge With Multi-Scale Negative Sampling. 714-720 - Hua Lu, Zhen Guo, Chanjuan Li, Yunyi Yang, Huang He, Siqi Bao:
Towards Building an Open-Domain Dialogue System Incorporated With Internet Memes. 721-726 - Jungwoo Lim, Taesun Whang, Dongyub Lee, Heuiseok Lim:
Adaptive Multi-Domain Dialogue State Tracking on Spoken Conversations. 727-732 - David Thulke, Nico Daheim, Christian Dugast, Hermann Ney:
Task-Oriented Document-Grounded Dialog Systems by HLTPR@RWTH for DSTC9 and DSTC10. 733-741 - Han Wu, Kun Xu, Linqi Song:
Structure-Aware Dialogue Modeling Methods for Conversational Semantic Role Labeling. 742-752 - Zhe Chen, Hongcheng Liu, Yu Wang:
DialogMCF: Multimodal Context Flow for Audio Visual Scene-Aware Dialog. 753-764 - Koichiro Yoshino, Yun-Nung Chen, Paul A. Crook, Satwik Kottur, Jinchao Li, Behnam Hedayatnia, Seungwhan Moon, Zhengcong Fei, Zekang Li, Jinchao Zhang, Yang Feng, Jie Zhou, Seokhwan Kim, Yang Liu, Di Jin, Alexandros Papangelis, Karthik Gopalakrishnan, Dilek Hakkani-Tur, Babak Damavandi, Alborz Geramifard, Chiori Hori, Ankit Shah, Chen Zhang, Haizhou Li, João Sedoc, Luis F. D'Haro, Rafael E. Banchs, Alexander Rudnicky:
Overview of the Tenth Dialog System Technology Challenge: DSTC10. 765-778 - Shekhar Kumar Yadav, Nithin V. George:
Joint Dereverberation and Beamforming With Blind Estimation of the Shape Parameter of the Desired Source Prior. 779-793 - Yanxiong Li, Zhongjie Jiang, Qisheng Huang, Wenchang Cao, Jialong Li:
Lightweight Speaker Verification Using Transformation Module With Feature Partition and Fusion. 794-806 - Yuhan Dai, Zhirui Zhang, Yichao Du, Shengcai Liu, Lemao Liu, Tong Xu:
Datastore Distillation for Nearest Neighbor Machine Translation. 807-817 - Changtao Li, Feiran Yang, Jun Yang:
A Two-Stage Approach to Quality Restoration of Bone-Conducted Speech. 818-829 - Jie Zhou, Yuanbiao Lin, Qin Chen, Qi Zhang, Xuanjing Huang, Liang He:
CausalABSC: Causal Inference for Aspect Debiasing in Aspect-Based Sentiment Classification. 830-840 - Ruiying Lu, Bo Chen, Dandan Guo, Dongsheng Wang, Mingyuan Zhou:
Hierarchical Topic-Aware Contextualized Transformers. 841-852 - Yaru Zhao, Bo Cheng, Yakun Huang, Zhiguo Wan:
FluGCF: A Fluent Dialogue Generation Model With Coherent Concept Entity Flow. 853-867 - Changhao Ding, Zhangjie Fu, Zhongliang Yang, Qi Yu, Daqiu Li, Yongfeng Huang:
Context-Aware Linguistic Steganography Model Based on Neural Machine Translation. 868-878 - Zainab Alhakeem, Se-In Jang, Hong-Goo Kang:
Disentangled Representations in Local-Global Contexts for Arabic Dialect Identification. 879-890 - Jae-Hong Lee, Joon-Hyuk Chang:
Partitioning Attention Weight: Mitigating Adverse Effect of Incorrect Pseudo-Labels for Self-Supervised ASR. 891-905 - Ryo Fukuda, Katsuhito Sudoh, Satoshi Nakamura:
Improving Speech Translation Accuracy and Time Efficiency With Fine-Tuned wav2vec 2.0-Based Speech Segmentation. 906-916 - Seong-Gyun Leem, Daniel Fulford, Jukka-Pekka Onnela, David Gard, Carlos Busso:
Selective Acoustic Feature Enhancement for Speech Emotion Recognition With Noisy Speech. 917-929 - Alexander Bohlender, Ann Spriet, Wouter Tirry, Nilesh Madhu:
Spatially Selective Speaker Separation Using a DNN With a Location Dependent Feature Extraction. 930-945 - Matan Karo, Arie Yeredor, Itshak Lapidot:
Compact Time-Domain Representation for Logical Access Spoofed Audio. 946-958 - Or Berebi, Zamir Ben-Hur, David Lou Alon, Boaz Rafaely:
Analysis and Design of Head-Tracked Compensation for Bilateral Ambisonics. 959-972 - Wei Wang, Yanmin Qian:
Universal Cross-Lingual Data Generation for Low Resource ASR. 973-983 - Davide Berghi, Philip J. B. Jackson:
Leveraging Visual Supervision for Array-Based Active Speaker Detection and Localization. 984-995 - Daniel Aleksander Krause, Guillermo García-Barrios, Archontis Politis, Annamaria Mesaros:
Binaural Sound Source Distance Estimation and Localization for a Moving Listener. 996-1011 - Seung-Bin Kim, Sang-Hoon Lee, Ha-Yeong Choi, Seong-Whan Lee:
Audio Super-Resolution With Robust Speech Representation Learning of Masked Autoencoder. 1012-1022 - Omer Musa Battal, Aykut Koç:
Automatic Construction of Sememe Knowledge Bases From Machine Readable Dictionaries. 1023-1035 - Varun Krishna, Tarun Sai, Sriram Ganapathy:
Representation Learning With Hidden Unit Clustering for Low Resource Speech Applications. 1036-1047 - Zhengding Luo, Dongyuan Shi, Woon-Seng Gan, Qirui Huang:
Delayless Generative Fixed-Filter Active Noise Control Based on Deep Learning and Bayesian Filter. 1048-1060 - Zewen Chi, Heyan Huang, Luyang Liu, Yu Bai, Xiaoyan Gao, Xian-Ling Mao:
Can Pretrained English Language Models Benefit Non-English NLP Systems in Low-Resource Scenarios? 1061-1074 - Rui Liu, Yifan Hu, Haolin Zuo, Zhaojie Luo, Longbiao Wang, Guanglai Gao:
Text-to-Speech for Low-Resource Agglutinative Language With Morphology-Aware Language Model Pre-Training. 1075-1087 - Shu Jiang, Zuchao Li, Hai Zhao, Weiping Ding:
Entity-Relation Extraction as Full Shallow Semantic Dependency Parsing. 1088-1099 - Yoav Vered, Stephen J. Elliott:
A Parallel Analog and Digital Adaptive Feedforward Controller for Active Noise Control. 1100-1108 - Puning Zhang, Rongjian Zhao, Boran Yang, Yuexian Li, Zhigang Yang:
Integrated Syntactic and Semantic Tree for Targeted Sentiment Classification Using Dual-Channel Graph Convolutional Network. 1109-1124 - Xu Wang, Hainan Zhang, Shuai Zhao, Hongshen Chen, Zhuoye Ding, Zhiguo Wan, Bo Cheng, Yanyan Lan:
Debiasing Counterfactual Context With Causal Inference for Multi-Turn Dialogue Reasoning. 1125-1132 - Hoang Ngoc Chau, Tien Dat Bui, Huu Binh Nguyen, Thanh Thi Hien Duong, Quoc-Cuong Nguyen:
A Novel Approach to Multi-Channel Speech Enhancement Based on Graph Neural Networks. 1133-1144 - Yuchen Hu, Chen Chen, Qiushi Zhu, Eng Siong Chng:
Wav2code: Restore Clean Speech Representations via Codebook Lookup for Noise-Robust ASR. 1145-1156 - Tetsuya Ueda, Tomohiro Nakatani, Rintaro Ikeshita, Keisuke Kinoshita, Shoko Araki, Shoji Makino:
Blind and Spatially-Regularized Online Joint Optimization of Source Separation, Dereverberation, and Noise Reduction. 1157-1172 - Vibhav Agarwal, Sourav Ghosh, Harichandana B. S. S, Himanshu Arora, Barath Raj Kandur Raja:
TrICy: Trigger-Guided Data-to-Text Generation With Intent Aware Attention-Copy. 1173-1184 - Christoph Böddeker, Aswin Shanmugam Subramanian, Gordon Wichern, Reinhold Haeb-Umbach, Jonathan Le Roux:
TS-SEP: Joint Diarization and Separation Conditioned on Estimated Speaker Embeddings. 1185-1197 - Reza Varzandeh, Simon Doclo, Volker Hohmann:
Speech-Aware Binaural DOA Estimation Utilizing Periodicity and Spatial Features in Convolutional Neural Networks. 1198-1213 - Yigitcan Özer, Meinard Müller:
Source Separation of Piano Concertos Using Musically Motivated Augmentation Techniques. 1214-1225 - Lior Frenkel, Shlomo E. Chazan, Jacob Goldberger:
Domain Adaptation Using Suitable Pseudo Labels for Speech Enhancement and Dereverberation. 1226-1236 - Jiahao Zhao, Wenji Mao, Daniel Dajun Zeng:
Disentangled Text Representation Learning With Information-Theoretic Perspective for Adversarial Robustness. 1237-1247 - Dong Zhou, Fang Lei, Lin Li, Yongmei Zhou, Aimin Yang:
Cross-Modal Interaction via Reinforcement Feedback for Audio-Lyrics Retrieval. 1248-1260 - Xuechen Liu, Md. Sahidullah, Kong Aik Lee, Tomi Kinnunen:
Generalizing Speaker Verification for Spoof Awareness in the Embedding Space. 1261-1273 - Shiyao Cui, Jiangxia Cao, Xin Cong, Jiawei Sheng, Quangang Li, Tingwen Liu, Jinqiao Shi:
Enhancing Multimodal Entity and Relation Extraction With Variational Information Bottleneck. 1274-1285 - Yizhou Tan, Haojun Ai, Shengchen Li, Mark D. Plumbley:
Acoustic Scene Classification Across Cities and Devices via Feature Disentanglement. 1286-1297 - Orel Ben Zaken, Anurag Kumar, Vladimir Tourbabin, Boaz Rafaely:
Neural-Network-Based Direction-of-Arrival Estimation for Reverberant Speech - The Importance of Energetic, Temporal, and Spatial Information. 1298-1309 - Changsheng Quan, Xiaofei Li:
SpatialNet: Extensively Learning Spatial Information for Multichannel Joint Speech Separation, Denoising and Dereverberation. 1310-1323 - Matthew Baas, Herman Kamper:
Disentanglement in a GAN for Unconditional Speech Synthesis. 1324-1335 - Xian Li, Nian Shao, Xiaofei Li:
Self-Supervised Audio Teacher-Student Transformer for Both Clip-Level and Frame-Level Tasks. 1336-1351 - Yifan Chen, Gaofeng Cheng, Runyan Yang, Pengyuan Zhang, Yonghong Yan:
Interrelate Training and Clustering for Online Speaker Diarization. 1352-1364 - Sheng Feng, Xiaoqian Zhu, Shuqing Ma:
Masking Hierarchical Tokens for Underwater Acoustic Target Recognition With Self-Supervised Learning. 1365-1379 - Yangyang Zhao, Kai Yin, Zhenyu Wang, Mehdi Dastani, Shihan Wang:
Decomposed Deep Q-Network for Coherent Task-Oriented Dialogue Policy Learning. 1380-1391 - Jayneel Parekh, Sanjeel Parekh, Pavlo Mozharovskyi, Gaël Richard, Florence d'Alché-Buc:
Tackling Interpretability in Audio Classification Networks With Non-negative Matrix Factorization. 1392-1405 - Xiuying Chen, Shen Gao, Mingzhe Li, Qingqing Zhu, Xin Gao, Xiangliang Zhang:
Write Summary Step-by-Step: A Pilot Study of Stepwise Summarization. 1406-1415 - Changkai Lin, Hongju Cheng, Qiang Rao, Yang Yang:
M$^{3}$SA: Multimodal Sentiment Analysis Based on Multi-Scale Feature Extraction and Multi-Task Learning. 1416-1429 - Ritujoy Biswas, Karan Nathwani, Vinayak Abrol:
Statistically Guided Near-End Speech Intelligibility Improvement Through Voice Transformation and Transfer Learning. 1445-1456 - Linhui Sun, Shuo Yuan, Aifei Gong, Lei Ye, Eng Siong Chng:
Dual-Branch Modeling Based on State-Space Model for Speech Enhancement. 1457-1467 - Alkis Koudounas, Eliana Pastor, Giuseppe Attanasio, Vittorio Mazzia, Manuel Giollo, Thomas Gueudré, Elisa Reale, Luca Cagliero, Sandro Cumani, Luca de Alfaro, Elena Baralis, Daniele Amberti:
Towards Comprehensive Subgroup Performance Analysis in Speech Models. 1468-1480 - Wenmeng Xiong, Changchun Bao, Jing Zhou, Maoshen Jia, José Picheral:
Joint DOA Estimation and Dereverberation Based on Multi-Channel Linear Prediction Filtering and Azimuth Sparsity. 1481-1493 - Rui-Chen Zheng, Yang Ai, Zhen-Hua Ling:
Incorporating Ultrasound Tongue Images for Audio-Visual Speech Enhancement. 1430-1444 - Yehav Alkaher, Israel Cohen:
Howling Detection and Gain Control for Speech Reinforcement in a Noisy Car Cabin Environment. 1494-1505 - Xinfa Zhu, Yi Lei, Tao Li, Yongmao Zhang, Hongbin Zhou, Heng Lu, Lei Xie:
METTS: Multilingual Emotional Text-to-Speech by Cross-Speaker and Cross-Lingual Emotion Transfer. 1506-1518 - Myeonghun Jeong, Minchan Kim, Byoung Jin Choi, Jaesam Yoon, Won Jang, Nam Soo Kim:
Transfer Learning for Low-Resource, Multi-Lingual, and Zero-Shot Multi-Speaker Text-to-Speech. 1519-1530 - Jiadi Yao, Hong Luo, Jun Qi, Xiao-Lei Zhang:
Interpretable Spectrum Transformation Attacks to Speaker Recognition Systems. 1531-1545 - Xiang Chen, Lei Li, Yuqi Zhu, Shumin Deng, Chuanqi Tan, Fei Huang, Luo Si, Ningyu Zhang, Huajun Chen:
Sequence Labeling as Non-Autoregressive Dual-Query Set Generation. 1546-1558 - Lei Liu, Li Liu, Haizhou Li:
Computation and Parameter Efficient Multi-Modal Fusion Transformer for Cued Speech Recognition. 1559-1572 - Adrián Barahona-Ríos, Tom Collins:
NoiseBandNet: Controllable Time-Varying Neural Synthesis of Sound Effects Using Filterbanks. 1573-1585 - Siyuan Wang, Zhongyu Wei, Jiarong Xu, Taishan Li, Zhihao Fan:
Unifying Structure Reasoning and Language Pre-Training for Complex Reasoning Tasks. 1586-1595 - Yijing Chu, Sipei Zhao, Feng Niu, Yongzheng Dong, Yuezhe Zhao:
A New Diffusion Filtered-X Affine Projection Algorithm: Performance Analysis and Application in Windy Environment. 1596-1608 - Yuquan Le, Zhe Quan, Jiawei Wang, Da Cao, Kenli Li:
$\boldsymbol{R}^{2}$: A Novel Recall & Ranking Framework for Legal Judgment Prediction. 1609-1622 - Xiaotong Jiang, Ruirui Bai, Zhongqing Wang, Guodong Zhou:
Cross-Domain Aspect-Based Sentiment Classification With Tripartite Graph Modeling. 1623-1635 - Zhengyang Chen, Bing Han, Shuai Wang, Yanmin Qian:
Attention-Based Encoder-Decoder End-to-End Neural Diarization With Embedding Enhancer. 1636-1649 - Chenfeng Miao, Qingying Zhu, Minchuan Chen, Jun Ma, Shaojun Wang, Jing Xiao:
EfficientTTS 2: Variational End-to-End Text-to-Speech Synthesis and Voice Conversion. 1650-1661 - Orel Peretz, Israel Cohen:
Constant Elevation-Beamwidth Beamforming With Concentric Ring Arrays. 1662-1672 - Zhibin Quan, Chi-Man Vong, Weili Zeng, Wankou Yang:
The MorPhEMe Machine: An Addressable Neural Memory for Learning Knowledge-Regularized Deep Contextualized Chinese Embedding. 1673-1686 - Lijian Gao, Qirong Mao, Ming Dong:
On Local Temporal Embedding for Semi-Supervised Sound Event Detection. 1687-1698 - Xuehao Zhou, Mingyang Zhang, Yi Zhou, Zhizheng Wu, Haizhou Li:
Accented Text-to-Speech Synthesis With Limited Data. 1699-1711 - Vinay Kothapally, John H. L. Hansen:
Monaural Speech Dereverberation Using Deformable Convolutional Networks. 1712-1723 - Taihui Wang, Feiran Yang, Jun Yang:
Multichannel Linear Prediction-Based Speech Dereverberation Considering Sparse and Low-Rank Priors. 1724-1735 - Saurabh Kataria, Jesús Villalba, Laureano Moro-Velázquez, Piotr Zelasko, Najim Dehak:
Time-Domain Speech Super-Resolution With GAN Based Modeling for Telephony Speaker Verification. 1736-1749 - Marco Olivieri, Amy Bastine, Mirco Pezzoli, Fabio Antonacci, Thushara D. Abhayapala, Augusto Sarti:
Acoustic Imaging With Circular Microphone Array: A New Approach for Sound Field Analysis. 1750-1761 - Tengfei Liu, Yongli Hu, Junbin Gao, Yanfeng Sun, Baocai Yin:
Hierarchical Multi-Granularity Interaction Graph Convolutional Network for Long Document Classification. 1762-1775 - Etienne Thuillier, Craig T. Jin, Vesa Välimäki:
HRTF Interpolation Using a Spherical Neural Process Meta-Learner. 1790-1802 - Xun Gong, Yu Wu, Jinyu Li, Shujie Liu, Rui Zhao, Xie Chen, Yanmin Qian:
Advanced Long-Content Speech Recognition With Factorized Neural Transducer. 1803-1815 - Yoshiki Masuyama, Kouei Yamaoka, Takao Kawamura, Nobutaka Ono:
Efficient Joint Optimization of Sampling Rate Offsets Using Entire Multichannel Signal. 1816-1828 - Takaaki Saeki, Soumi Maiti, Xinjian Li, Shinji Watanabe, Shinnosuke Takamichi, Hiroshi Saruwatari:
Text-Inductive Graphone-Based Language Adaptation for Low-Resource Speech Synthesis. 1829-1844 - Douglas D. O'Shaughnessy:
Review of Methods for Automatic Speaker Verification. 1776-1789 - Yingming Gao, Peter Birkholz, Ya Li:
Articulatory Copy Synthesis Based on the Speech Synthesizer VocalTractLab and Convolutional Recurrent Neural Networks. 1845-1858 - Théo Mariotte, Anthony Larcher, Silvio Montrésor, Jean-Hugh Thomas:
Channel-Combination Algorithms for Robust Distant Voice Activity and Overlapped Speech Detection. 1859-1872 - Luciana M. X. de Souza, Márcio H. Costa, Renata Coelho Borges:
Envelope-Based Multichannel Noise Reduction for Cochlear Implant Applications. 1873-1884 - Linjian Li, Yi Cai, Xin Wu:
Unsupervised Disentanglement Learning Model for Exemplar-Guided Paraphrase Generation. 1885-1900 - Amir Ivry, Israel Cohen, Baruch Berdugo:
A User-Centric Approach for Deep Residual-Echo Suppression in Double-Talk. 1901-1914 - Geng Zhang, Jin Liu, Guangyou Zhou, Kunsong Zhao, Zhiwen Xie, Bo Huang:
Question-Directed Reasoning With Relation-Aware Graph Attention Network for Complex Question Answering Over Knowledge Graph. 1915-1927 - Yu Yao, Peng Yang, Guangzhen Zhao, Guoshun Yin:
KGAgent: Learning a Deep Reinforced Agent for Keyphrase Generation. 1928-1940 - Jiahong Li, Chenda Li, Yifei Wu, Yanmin Qian:
Unified Cross-Modal Attention: Robust Audio-Visual Speech Recognition and Beyond. 1941-1953 - Mieszko Fras, Konrad Kowalczyk:
Reverberant Source Separation Using NTF With Delayed Subsources and Spatial Priors. 1954-1967 - Rui Wang, Li Li, Tomoki Toda:
Dual-Channel Target Speaker Extraction Based on Conditional Variational Autoencoder and Directional Information. 1968-1979 - Qinyu Han, Zhihao Yang, Hongfei Lin, Tian Qin:
Let Topic Flow: A Unified Topic-Guided Segment-Wise Dialogue Summarization Framework. 2021-2032 - Haonan Cheng, Shulin Liu, Zhicheng Lian, Long Ye, Qin Zhang:
MusicECAN: An Automatic Denoising Network for Music Recordings With Efficient Channel Attention. 2033-2049 - Guy Gubnitky, Roee Diamant:
Detecting the Presence of Sperm Whales' Echolocation Clicks in Noisy Environments. 2050-2061 - Yuxia Wu, Tianhao Dai, Zhedong Zheng, Lizi Liao:
Active Discovering New Slots for Task-Oriented Conversation. 2062-2072 - Jacob Hollebon, Filippo Maria Fazi:
Dynamic Higher-Order Stereophony. 2073-2084 - Aidan O. T. Hogg, Mads Jenkins, He Liu, Isaac Squires, Samuel J. Cooper, Lorenzo Picinali:
HRTF Upsampling With a Generative Adversarial Network Using a Gnomonic Equiangular Projection. 2085-2099 - Yusheng Liao, Yanfeng Wang, Yu Wang:
Leveraging Diverse Modeling Contexts With Collaborating Learning for Neural Machine Translation. 2100-2111 - Shuo Li, Xiaojun Bi, Tao Liu, Zheng Chen:
Information Dropping Data Augmentation for Machine Translation Quality Estimation. 2112-2124 - Shuoran Jiang, Qingcai Chen, Yang Xiang, Youcheng Pan, Xiangping Wu:
BaSFormer: A Balanced Sparsity Regularized Attention Network for Transformer. 2125-2140 - Morgan Buisson, Brian McFee, Slim Essid, Hélène C. Crayencour:
Self-Supervised Learning of Multi-Level Audio Representations for Music Segmentation. 2141-2152 - Cong Ma, Xu Han, Linghui Wu, Yaping Zhang, Yang Zhao, Yu Zhou, Chengqing Zong:
Modal Contrastive Learning Based End-to-End Text Image Machine Translation. 2153-2165 - Ruiyu Liang, Yue Xie, Jiaming Cheng, Cong Pang, Björn W. Schuller:
A Non-Invasive Speech Quality Evaluation Algorithm for Hearing Aids With Multi-Head Self-Attention and Audiogram-Based Features. 2166-2176 - Ziqiang Zhang, Sanyuan Chen, Long Zhou, Yu Wu, Shuo Ren, Shujie Liu, Zhuoyuan Yao, Xun Gong, Li-Rong Dai, Jinyu Li, Furu Wei:
SpeechLM: Enhanced Speech Pre-Training With Unpaired Textual Data. 2177-2187 - Rui Liu, Berrak Sisman, Guanglai Gao, Haizhou Li:
Controllable Accented Text-to-Speech Synthesis With Fine and Coarse-Grained Intensity Rendering. 2188-2201 - Kshitij Mishra, Mauajama Firdaus, Asif Ekbal:
Please Donate to Save a Life: Inducing Politeness to Handle Resistance in Persuasive Dialogue Agents. 2202-2212 - Hirokazu Kameoka, Takuhiro Kaneko, Kou Tanaka, Nobukatsu Hojo, Shogo Seki:
VoiceGrad: Non-Parallel Any-to-Many Voice Conversion With Annealed Langevin Dynamics. 2213-2226 - Florian Schmid, Khaled Koutini, Gerhard Widmer:
Dynamic Convolutional Neural Networks as Efficient Pre-Trained Audio Models. 2227-2241 - Michael Neri, Archontis Politis, Daniel Aleksander Krause, Marco Carli, Tuomas Virtanen:
Speaker Distance Estimation in Enclosures From Single-Channel Audio. 2242-2254 - Triantafyllos Kefalas, Yannis Panagakis, Maja Pantic:
Large-Scale Unsupervised Audio Pre-Training for Video-to-Speech Synthesis. 2255-2268 - Ju-ho Kim, Jungwoo Heo, Hyun-seo Shin, Chan-yeong Lim, Ha-Jin Yu:
FA-ExU-Net: The Simultaneous Training of an Embedding Extractor and Enhancement Model for a Speaker Verification System Robust to Short Noisy Utterances. 2269-2282 - Yang Ai, Zhen-Hua Ling:
Low-Latency Neural Speech Phase Prediction Based on Parallel Estimation Architecture and Anti-Wrapping Losses for Speech Generation Tasks. 2283-2296 - Yanxiong Li, Jialong Li, Yongjie Si, Jiaxin Tan, Qianhua He:
Few-Shot Class-Incremental Audio Classification With Adaptive Mitigation of Forgetting and Overfitting. 2297-2311 - Tianchi Liu, Kong Aik Lee, Qiongqiong Wang, Haizhou Li:
Golden Gemini is All You Need: Finding the Sweet Spots for Speaker Verification. 2324-2337 - Lei Zhao, Wenbo Zhu, Shengqiang Li, Hong Luo, Xiao-Lei Zhang, Susanto Rahardja:
Multi-Resolution Convolutional Residual Neural Networks for Monaural Speech Dereverberation. 2338-2351 - Christian Geishauser, Carel van Niekerk, Nurul Lubis, Hsien-Chin Lin, Michael Heck, Shutong Feng, Benjamin Matthias Ruppik, Renato Vukovic, Milica Gasic:
Learning With an Open Horizon in Ever-Changing Dialogue Circumstances. 2352-2366 - Yusuf Eren, Buket Çolak Güvenç, Engin Cemal Mengüç:
Cost-Effective Acoustic Feedback Cancellers for Digital Hearing Aids. 2367-2377 - Jianchen Li, Jiqing Han, Fan Qian, Tieran Zheng, Yongjun He, Guibin Zheng:
Distance Metric-Based Open-Set Domain Adaptation for Speaker Verification. 2378-2390 - Daisuke Niizumi, Daiki Takeuchi, Yasunori Ohishi, Noboru Harada, Kunio Kashino:
Masked Modeling Duo: Towards a Universal Audio Pre-Training Framework. 2391-2406 - Guangzhi Sun, Chao Zhang, Philip C. Woodland:
Graph Neural Networks for Contextual ASR With the Tree-Constrained Pointer Generator. 2407-2417 - Bengt J. Borgström, Michael S. Brandstein:
A Multiscale Autoencoder (MSAE) Framework for End-to-End Neural Network Speech Enhancement. 2418-2431 - Kun Wei, Bei Li, Hang Lv, Quan Lu, Ning Jiang, Lei Xie:
Conversational Speech Recognition by Learning Audio-Textual Cross-Modal Contextual Representation. 2432-2444 - Shang-Yu Su, Yung-Sung Chung, Yun-Nung Chen:
Joint Dual Learning With Mutual Information Maximization for Natural Language Understanding and Generation in Dialogues. 2445-2452 - Cunhang Fan, Mingming Ding, Jianhua Tao, Ruibo Fu, Jiangyan Yi, Zhengqi Wen, Zhao Lv:
Dual-Branch Knowledge Distillation for Noise-Robust Synthetic Speech Detection. 2453-2466 - Hassan Taherian, DeLiang Wang:
Multi-Channel Conversational Speaker Separation via Neural Diarization. 2467-2476 - Sherif Abdulatif, Ruizhe Cao, Bin Yang:
CMGAN: Conformer-Based Metric-GAN for Monaural Speech Enhancement. 2477-2493 - Puhai Yang, Heyan Huang, Shumin Shi, Xian-Ling Mao:
STN4DST: A Scalable Dialogue State Tracking Based on Slot Tagging Navigation. 2494-2507 - Hang Chen, Qing Wang, Jun Du, Bao-Cai Yin, Jia Pan, Chin-Hui Lee:
Optimizing Audio-Visual Speech Enhancement Using Multi-Level Distortion Measures for Audio-Visual Speech Recognition. 2508-2521 - Anderson Queiroz, Rosângela Coelho:
Harmonic Detection From Noisy Speech With Auditory Frame Gain for Intelligibility Enhancement. 2522-2531 - Maodi Hu, Li Qian, Zhijun Chang, Zhixiong Zhang:
KDPG-Enhanced MRC Framework for Scientific Entity Recognition in Survey Papers. 2532-2543 - Leanne Nortje, Dan Oneata, Herman Kamper:
Visually Grounded Few-Shot Word Learning in Low-Resource Settings. 2544-2554 - Purnima Kamath, Chitralekha Gupta, Lonce Wyse, Suranga Nanayakkara:
Example-Based Framework for Perceptually Guided Audio Texture Generation. 2555-2565 - Arka Roy, Udit Satija:
A Novel Multi-Head Self-Organized Operational Neural Network Architecture for Chronic Obstructive Pulmonary Disease Detection Using Lung Sounds. 2566-2575 - Rongzhi Gu, Yi Luo:
ReZero: Region-Customizable Sound Extraction. 2576-2589 - Wenbin Wang, Yang Song, Sanjay K. Jha:
USAT: A Universal Speaker-Adaptive Text-to-Speech Approach. 2590-2604 - Han Han, Vincent Lostanlen, Mathieu Lagrange:
Learning to Solve Inverse Problems for Perceptual Sound Matching. 2605-2615 - Nursadul Mamun, John H. L. Hansen:
Speech Enhancement for Cochlear Implant Recipients Using Deep Complex Convolution Transformer With Frequency Transformation. 2616-2629 - Cheng Peng, Haobo Wang, Jue Wang, Lidan Shou, Ke Chen, Gang Chen, Chang Yao:
Learning Label-Adaptive Representation for Large-Scale Multi-Label Text Classification. 2630-2640 - Junchuan Zhao, Low Qi Hong Chetwin, Ye Wang:
SinTechSVS: A Singing Technique Controllable Singing Voice Synthesis System. 2641-2653 - Hyung-Seok Oh, Sang-Hoon Lee, Seong-Whan Lee:
DiffProsody: Diffusion-Based Latent Prosody Generation for Expressive Speech Synthesis With Prosody Conditional Adversarial Training. 2654-2666 - Stefano Damiano, Federico Borra, Alberto Bernardini, Fabio Antonacci, Augusto Sarti:
A Compressive Sensing Approach for the Reconstruction of the Soundfield Produced by Directive Sources in Reverberant Rooms. 2667-2679 - Jiaming Cheng, Ruiyu Liang, Lin Zhou, Li Zhao, Chengwei Huang, Björn W. Schuller:
Residual Fusion Probabilistic Knowledge Distillation for Speech Enhancement. 2680-2691 - Shih-Lun Wu, Chris Donahue, Shinji Watanabe, Nicholas J. Bryan:
Music ControlNet: Multiple Time-Varying Controls for Music Generation. 2692-2703 - Youzhi Tu, Man-Wai Mak, Jen-Tzung Chien:
Contrastive Self-Supervised Speaker Embedding With Sequential Disentanglement. 2704-2715 - Kavya Ranjan Saxena, Vipul Arora:
Interactive Singing Melody Extraction Based on Active Adaptation. 2729-2738 - Shuoran Jiang, Youcheng Pan, Qingcai Chen, Yang Xiang, Xiangping Wu:
Learning to Improve Out-of-Distribution Generalization via Self-Adaptive Language Masking. 2739-2750 - Alexander Shirnin, Nikita Andreev, Sofia Potapova, Ekaterina Artemova:
Analyzing the Robustness of Vision & Language Models. 2751-2763 - Han Ding, Linwei Zhai, Cui Zhao, Fei Wang, Ge Wang, Wei Xi, Zhi Wang, Jizhong Zhao:
Genre Classification Empowered by Knowledge-Embedded Music Representation. 2764-2776 - Lester Phillip Violeta, Ding Ma, Wen-Chin Huang, Tomoki Toda:
Pretraining and Adaptation Techniques for Electrolaryngeal Speech Recognition. 2777-2789 - Yanjie Sun, Kele Xu, Chaorun Liu, Yong Dou, Huaimin Wang, Bo Ding, Qinghua Pan:
Automated Data Augmentation for Audio Classification. 2716-2728 - Marc Arnela, Oriol Guasch:
Formant Frequency Tuning of Three-Dimensional MRI-Based Vocal Tracts for the Finite Element Synthesis of Vowels. 2790-2799 - Xiang Huang, Hao Peng, Dongcheng Zou, Zhiwei Liu, Jianxin Li, Kay Liu, Jia Wu, Jianlin Su, Philip S. Yu:
CoSENT: Consistent Sentence Embedding via Similarity Ranking. 2800-2813 - Yangfu Li, Jiapan Gan, Xiaodan Lin, Yingqiang Qiu, Hongjian Zhan, Hui Tian:
DS-TDNN: Dual-Stream Time-Delay Neural Network With Global-Aware Filter for Speaker Verification. 2814-2827 - Hao Zhang, Yixuan Zhang, Meng Yu, Dong Yu:
Enhanced Acoustic Howling Suppression via Hybrid Kalman Filter and Deep Learning Models. 2828-2840 - Zhuoyuan Mao, Chenhui Chu, Sadao Kurohashi:
EMS: Efficient and Effective Massively Multilingual Sentence Embedding Learning. 2841-2856 - Ernst Seidel, Pejman Mowlaee, Tim Fingscheidt:
Convergence and Performance Analysis of Classical, Hybrid, and Deep Acoustic Echo Control. 2857-2870 - Haohe Liu, Yi Yuan, Xubo Liu, Xinhao Mei, Qiuqiang Kong, Qiao Tian, Yuping Wang, Wenwu Wang, Yuxuan Wang, Mark D. Plumbley:
AudioLDM 2: Learning Holistic Audio Generation With Self-Supervised Pretraining. 2871-2883 - Shu-Wen Yang, Heng-Jui Chang, Zili Huang, Andy T. Liu, Cheng-I Lai, Haibin Wu, Jiatong Shi, Xuankai Chang, Hsiang-Sheng Tsai, Wen-Chin Huang, Tzu-hsun Feng, Po-Han Chi, Yist Y. Lin, Yung-Sung Chuang, Tzu-Hsien Huang, Wei-Cheng Tseng, Kushal Lakhotia, Shang-Wen Li, Abdelrahman Mohamed, Shinji Watanabe, Hung-yi Lee:
A Large-Scale Evaluation of Speech Foundation Models. 2884-2899 - Liang Xu, Xiaoxuan Bu, Xuetao Tian:
Dynamic Prompt-Driven Zero-Shot Relation Extraction. 2900-2912 - Dongchao Yang, Songxiang Liu, Rongjie Huang, Chao Weng, Helen Meng:
InstructTTS: Modelling Expressive TTS in Discrete Latent Space With Natural Language Style Prompt. 2913-2925 - Zhichao Wang, Liumeng Xue, Qiuqiang Kong, Lei Xie, Yuanzhe Chen, Qiao Tian, Yuping Wang:
Multi-Level Temporal-Channel Speaker Retrieval for Zero-Shot Voice Conversion. 2926-2937 - Youngjae Chang, Youngjoong Ko:
Two-Step Masked Language Model for Domain-Adapting Multi-Modal Task-Oriented Dialogue Systems. 2938-2943 - Jixun Yao, Qing Wang, Pengcheng Guo, Ziqian Ning, Lei Xie:
Distinctive and Natural Speaker Anonymization via Singular Value Transformation-Assisted Matrix. 2944-2956 - Congcong Sun, Hui Tian, Peng Tian, Haizhou Li, Zhenxing Qian:
Multi-Agent Deep Learning for the Detection of Multiple Speech Steganography Methods. 2957-2972 - Xianrui Wang, Yichen Yang, Andreas Brendel, Tetsuya Ueda, Shoji Makino, Jacob Benesty, Walter Kellermann, Jingdong Chen:
On Semi-Blind Source Separation-Based Approaches to Nonlinear Echo Cancellation Based on Bilinear Alternating Optimization. 2973-2987 - Zhihua Fang, Liang He, Lin Li, Ying Hu:
Improving Speaker Verification With Noise-Aware Label Ensembling and Sample Selection: Learning and Correcting Noisy Speaker Labels. 2988-3001 - Yuanhang Zheng, Zhixing Tan, Peng Li, Yang Liu:
Black-Box Prompt Tuning With Subspace Learning. 3002-3013 - Shuai Zhao, Luu Anh Tuan, Jie Fu, Jinming Wen, Weiqi Luo:
Exploring Clean Label Backdoor Attacks and Defense in Language Models. 3014-3024 - Zilu Guo, Qing Wang, Jun Du, Jia Pan, Qing-Feng Liu, Chin-Hui Lee:
A Variance-Preserving Interpolation Approach for Diffusion Models With Applications to Single Channel Speech Enhancement and Recognition. 3025-3038 - Ankita, Shambhavi, S. Shahnawazuddin:
Effect of Modeling Glottal Activity Parameters on Zero-Shot Children's ASR. 3039-3048 - Hao Shi, Masato Mimura, Tatsuya Kawahara:
Waveform-Domain Speech Enhancement Using Spectrogram Encoding for Robust Speech Recognition. 3049-3060 - Xiao Wei, Yuhang Li, Yuke Si, Longbiao Wang, Xiaobao Wang, Jianwu Dang:
A Prompt-Based Hierarchical Pipeline for Cross-Domain Slot Filling. 3061-3075 - Xueqin Luo, Jilu Jin, Gongping Huang, Jingdong Chen, Jacob Benesty:
Design of Fully Steerable Differential Beamformers With Linear Superarrays. 3076-3089 - Niels de Koeijer, Martin Bo Møller, Jorge Martínez, Pablo Martínez-Nuevo, Richard C. Hendriks:
Block-Based Perceptually Adaptive Sound Zones With Reproduction Error Constraints. 3090-3100 - Weize Chen, Xu Han, Yankai Lin, Kaichen He, Ruobing Xie, Jie Zhou, Zhiyuan Liu, Maosong Sun:
Hyperbolic Pre-Trained Language Model. 3101-3112 - Jinghui Qin, Zhongzhan Huang, Ying Zeng, Quanshi Zhang, Liang Lin:
An Introspective Data Augmentation Method for Training Math Word Problem Solvers. 3113-3127 - Wenqi Zhang, Yongliang Shen, Guiyang Hou, Kuangyi Wang, Weiming Lu:
Specialized Mathematical Solving by a Step-By-Step Expression Chain Generation. 3128-3140 - Viktor Gunnarsson:
Spectral Correction of Audio Objects in Stereophonic Rendering. 3141-3156 - Zijie Wang, Yegui Xiao, Yaping Ma, Liying Ma, Khashayar Khorasani:
A New Hybrid Active Noise Control System With Input-Power-Controlled Online Secondary-Path Modeling. 3157-3170 - Tomoki Matsunaga, Hiroaki Saito:
Multi-Layer Combined Frequency and Periodicity Representations for Multi-Pitch Estimation of Multi-Instrument Music. 3171-3184 - Shuming Luan, Yukoh Wakabayashi, Tomoki Toda:
Unequally Spaced Sound Field Interpolation for Rotation-Robust Beamforming. 3185-3199 - Liang Tao, Maoshen Jia, Changchun Bao, Wenmeng Xiong:
First-Order Relative Harmonic Coefficient-Based Time-Frequency Points Selection for Multi-Source DOA Estimation. 3200-3212 - Bolaji Yusuf, Murat Saraçlar:
Written Term Detection Improves Spoken Term Detection. 3213-3223 - Mateusz Guzik, Konrad Kowalczyk:
On Ambisonic Source Separation With Spatially Informed Non-Negative Tensor Factorization. 3238-3255 - Yang Ai, Xiao-Hang Jiang, Ye-Xin Lu, Hui-Peng Du, Zhen-Hua Ling:
APCodec: A Neural Audio Codec With Parallel Amplitude and Phase Spectrum Encoding and Decoding. 3256-3269 - Xuan Feng, Tianlong Gu, Liang Chang, Xiaoli Liu:
PROTECT: Parameter-Efficient Tuning for Few-Shot Robust Chinese Text Correction. 3270-3282 - Moti Lugasi, Jacob Donley, Anjali Menon, Vladimir Tourbabin, Boaz Rafaely:
Multi-Channel to Multi-Channel Noise Reduction and Reverberant Speech Preservation in Time-Varying Acoustic Scenes for Binaural Reproduction. 3283-3295 - Li Wang, Lingyun Yu, Yongdong Zhang, Hongtao Xie:
Generalizable Speech Spoofing Detection Against Silence Trimming With Data Augmentation and Multi-Task Meta-Learning. 3296-3310 - Xinhao Mei, Xubo Liu, Jianyuan Sun, Mark D. Plumbley, Wenwu Wang:
Towards Generating Diverse Audio Captions via Adversarial Training. 3311-3323 - Dong-Jiang Zhang, Wei-Tao Zhang, Yuying Ma, Zhen-Zhen Huang:
Anti-Aliasing Speech DOA Estimation Under Spatial Aliasing Conditions. 3324-3338 - Xinhao Mei, Chutong Meng, Haohe Liu, Qiuqiang Kong, Tom Ko, Chengqi Zhao, Mark D. Plumbley, Yuexian Zou, Wenwu Wang:
WavCaps: A ChatGPT-Assisted Weakly-Labelled Audio Captioning Dataset for Audio-Language Multimodal Research. 3339-3354 - Xiaofei Wang, Manthan Thakker, Zhuo Chen, Naoyuki Kanda, Sefik Emre Eskimez, Sanyuan Chen, Min Tang, Shujie Liu, Jinyu Li, Takuya Yoshioka:
SpeechX: Neural Codec Language Model as a Versatile Speech Transformer. 3355-3364 - Xincheng Yu, Dongyue Guo, Jianwei Zhang, Yi Lin:
ROSE: A Recognition-Oriented Speech Enhancement Framework in Air Traffic Control Using Multi-Objective Learning. 3365-3378 - Gwendal Le Vaillant, Thierry Dutoit:
Latent Space Interpolation of Synthesizer Parameters Using Timbre-Regularized Auto-Encoders. 3379-3392 - Jagabandhu Mishra, S. R. Mahadeva Prasanna:
Implicit Self-Supervised Language Representation for Spoken Language Diarization. 3393-3407 - Xiaoyi Qin, Na Li, Shufei Duan, Ming Li:
Investigating Long-Term and Short-Term Time-Varying Speaker Verification. 3408-3423 - Hao Gao, Junlong Ren, Jiazheng Cheng, Yong Shen:
Optimal Modal Decomposition for Directionally Biased Sound Field Recording. 3424-3436 - Pijian Li, Qingbao Huang, Zhigang Li, Yi Cai, Feng Shuang, Qing Li:
Multi-Granularity Feature Fusion for Image-Guided Story Ending Generation. 3437-3449 - Federico Landini, Mireia Díez, Themos Stafylakis, Lukás Burget:
DiaPer: End-to-End Neural Diarization With Perceiver-Based Attractors. 3450-3465 - Markéta Rezácková, Daniel Tihelka, Jindrich Matousek:
T5G2P: Text-to-Text Transfer Transformer Based Grapheme-to-Phoneme Conversion. 3466-3476 - Michele Panariello, Natalia A. Tomashenko, Xin Wang, Xiaoxiao Miao, Pierre Champion, Hubert Nourtel, Massimiliano Todisco, Nicholas W. D. Evans, Emmanuel Vincent, Junichi Yamagishi:
The VoicePrivacy 2022 Challenge: Progress and Perspectives in Voice Anonymisation. 3477-3491 - Qi Wang, Mingkuan Liu, Changchun Bao, Maoshen Jia:
Harmonic-Aware Frequency and Time Attention for Automatic Piano Transcription. 3492-3506 - Keqi Deng, Philip C. Woodland:
Label-Synchronous Neural Transducer for Adaptable Online E2E Speech Recognition. 3507-3516 - Elisa Tengan, Thomas Dietzen, Filip Elvander, Toon van Waterschoot:
Multi-Source Direction-of-Arrival Estimation Using Steered Response Power and Group-Sparse Optimization. 3517-3531 - Danwei Cai, Ming Li:
Leveraging ASR Pretrained Conformers for Speaker Verification Through Transfer Learning and Knowledge Distillation. 3532-3545 - Dejan Porjazovski, Tamás Grósz, Mikko Kurimo:
From Raw Speech to Fixed Representations: A Comprehensive Evaluation of Speech Embedding Techniques. 3546-3560 - Shujie Hu, Xurong Xie, Mengzhe Geng, Zengrui Jin, Jiajun Deng, Guinan Li, Yi Wang, Mingyu Cui, Tianzi Wang, Helen Meng, Xunying Liu:
Self-Supervised ASR Models and Features for Dysarthric and Elderly Speech Recognition. 3561-3575 - Jinmeng Wu, Tingting Mu, Jeyarajan Thiyagalingam, Hanyu Hong, Yanbin Hao, Tianxu Zhang, John Yannis Goulermas:
Iterative Semantic Transformer by Greedy Distillation for Community Question Answering. 3576-3588 - Tsubasa Ochiai, Kazuma Iwamoto, Marc Delcroix, Rintaro Ikeshita, Hiroshi Sato, Shoko Araki, Shigeru Katagiri:
Rethinking Processing Distortions: Disentangling the Impact of Speech Enhancement Errors on Speech Recognition Performance. 3589-3602 - Ruiteng Zhang, Jianguo Wei, Xugang Lu, Wenhuan Lu, Di Jin, Lin Zhang, Junhai Xu:
Unsupervised Adaptive Speaker Recognition by Coupling-Regularized Optimal Transport. 3603-3617 - Yuqing Li, Xianke Wang, Ruimin Wu, Wei Xu, Wenqing Cheng:
A Two-Stage Audio-Visual Fusion Piano Transcription Model Based on the Attention Mechanism. 3618-3630 - Yujia Qin, Xiaozhi Wang, Yusheng Su, Yankai Lin, Ning Ding, Jing Yi, Weize Chen, Zhiyuan Liu, Juanzi Li, Lei Hou, Peng Li, Maosong Sun, Jie Zhou:
Exploring Universal Intrinsic Task Subspace for Few-Shot Learning via Prompt Tuning. 3631-3643 - Ho-Lam Chung, Ying-Hong Chan, Yao-Chung Fan:
Handover QG: Question Generation by Decoder Fusion and Reinforcement Learning. 3644-3655 - Yinlong Xiao, Zongcheng Ji, Jianqiang Li, Mei Han:
MVT: Chinese NER Using Multi-View Transformer. 3656-3668 - Ziqi Yuan, Jingliang Fang, Hua Xu, Kai Gao:
Multimodal Consistency-Based Teacher for Semi-Supervised Multimodal Sentiment Analysis. 3669-3683 - Sara Atito Ali Ahmed, Muhammad Awais, Wenwu Wang, Mark D. Plumbley, Josef Kittler:
ASiT: Local-Global Audio Spectrogram Vision Transformer for Event Classification. 3684-3693 - Xiachong Feng, Xiaocheng Feng, Xiyuan Du, Min-Yen Kan, Bing Qin:
Adapter-Based Selective Knowledge Distillation for Federated Multi-Domain Meeting Summarization. 3694-3708 - Tianrui Wang, Long Zhou, Ziqiang Zhang, Yu Wu, Shujie Liu, Yashesh Gaur, Zhuo Chen, Jinyu Li, Furu Wei:
VioLA: Conditional Language Models for Speech Recognition, Synthesis, and Translation. 3709-3716 - Angelo Cesar Mendes da Silva, Diego Furtado Silva, Ricardo Marcondes Marcacini:
Artist Similarity Based on Heterogeneous Graph Neural Networks. 3717-3729 - Kai-Wei Chang, Haibin Wu, Yu-Kai Wang, Yuan-Kuei Wu, Hua Shen, Wei-Cheng Tseng, Iu-thing Kang, Shang-wen Li, Hung-Yi Lee:
SpeechPrompt: Prompting Speech Language Models for Speech Processing Tasks. 3730-3744 - Matteo Scerbo, Lauri Savioja, Enzo De Sena:
Room Acoustic Rendering Networks With Control of Scattering and Early Reflections. 3745-3758
manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.