


default search action
IEEE/ACM Transactions on Audio, Speech and Language Processing, Volume 31
Volume 31, 2023
- Mrinmoy Bhattacharjee

, S. R. M. Prasanna
, Prithwijit Guha
:
Clean vs. Overlapped Speech-Music Detection Using Harmonic-Percussive Features and Multi-Task Learning. 1-10 - Zhaojie Luo

, Shoufeng Lin
, Rui Liu
, Jun Baba
, Yuichiro Yoshikawa
, Hiroshi Ishiguro:
Decoupling Speaker-Independent Emotions for Voice Conversion via Source-Filter Networks. 11-24 - Jinchuan Tian

, Jianwei Yu
, Chao Weng, Yuexian Zou
, Dong Yu
:
Integrating Lattice-Free MMI Into End-to-End Speech Recognition. 25-38 - Ravi Shankar

, Hsi-Wei Hsieh, Nicolas Charon
, Archana Venkataraman
:
A Diffeomorphic Flow-Based Variational Framework for Multi-Speaker Emotion Conversion. 39-53 - Ryandhimas E. Zezario

, Szu-Wei Fu
, Fei Chen
, Chiou-Shann Fuh
, Hsin-Min Wang
, Yu Tsao
:
Deep Learning-Based Non-Intrusive Multi-Objective Speech Assessment Model With Cross-Domain Features. 54-70 - Xiaoyi Qin, Danwei Cai

, Ming Li
:
Robust Multi-Channel Far-Field Speaker Verification Under Different In-Domain Data Availability Scenarios. 71-85 - Vikram C. Mathad

, Julie M. Liss, Kathy Chapman, Nancy Scherer, Visar Berisha:
Consonant-Vowel Transition Models Based on Deep Learning for Objective Evaluation of Articulation. 86-95 - Li Li

, Hirokazu Kameoka
, Shoji Makino
:
FastMVAE2: On Improving and Accelerating the Fast Variational Autoencoder-Based Source Separation Algorithm for Determined Mixtures. 96-110 - Jie Wang

, Yan Yang
, Keyu Liu
, Zhiping Zhu, Xiaorong Liu:
M3S: Scene Graph Driven Multi-Granularity Multi-Task Learning for Multi-Modal NER. 111-120 - Marc Delcroix

, Jorge Bennasar Vázquez, Tsubasa Ochiai, Keisuke Kinoshita
, Yasunori Ohishi
, Shoko Araki
:
SoundBeam: Target Sound Extraction Conditioned on Sound-Class Labels and Enrollment Clues for Increased Performance and Continuous Learning. 121-136 - Daisuke Niizumi

, Daiki Takeuchi, Yasunori Ohishi
, Noboru Harada
, Kunio Kashino:
BYOL for Audio: Exploring Pre-Trained General-Purpose Audio Representations. 137-151 - Yingrui Xu

, Hao Liu, Jingguo Ge
, Xiaodan Zhang, Jingyuan Hu
, Yulei Wu
, Honglei Lv, Hongbin Shi, Wei Zhou
:
Mining Weak Relations Between Reviews for Opinion Spam Detection. 152-162 - Yoshiki Masuyama

, Kohei Yatabe
, Kento Nagatomo, Yasuhiro Oikawa
:
Online Phase Reconstruction via DNN-Based Phase Differences Estimation. 163-176 - Jiang Liu

, Donghong Ji, Jingye Li, Dongdong Xie, Chong Teng, Liang Zhao
, Fei Li
:
TOE: A Grid-Tagging Discontinuous NER Model Enhanced by Embedding Tag/Word Relations and More Fine-Grained Tags. 177-187 - Zhe Hu

, Zhiwei Cao
, Hou Pong Chan
, Jiachen Liu, Xinyan Xiao, Jinsong Su
, Hua Wu:
Controllable Dialogue Generation With Disentangled Multi-Grained Style Specification and Attribute Consistency Reward. 188-199 - Sondes Abderrazek

, Corinne Fredouille
, Alain Ghio, Muriel Lalain, Christine Meunier
, Virginie Woisard:
Interpreting Deep Representations of Phonetic Features via Neuro-Based Concept Detector: Application to Speech Disorders Due to Head and Neck Cancer. 200-214 - Jie Zhang

, Rui Tao, Jun Du
, Li-Rong Dai:
Energy-Efficient Sparsity-Driven Speech Enhancement in Wireless Acoustic Sensor Networks. 215-228 - Xianke Wang

, Bowen Tian
, Weiming Yang, Wei Xu
, Wenqing Cheng:
MusicYOLO: A Vision-Based Framework for Automatic Singing Transcription. 229-241 - Yuanyuan Liu

, Mittapalle Kiran Reddy
, Nelly Penttilä
, Tiina Ihalainen
, Paavo Alku
, Okko Räsänen
:
Automatic Assessment of Parkinson's Disease Using Speech Representations of Phonation and Articulation. 242-255 - David Südholt

, Alec Wright
, Cumhur Erkut
, Vesa Välimäki
:
Pruning Deep Neural Network Models of Guitar Distortion Effects. 256-264 - Fangkai Jiao

, Yangyang Guo
, Minlie Huang
, Liqiang Nie
:
Enhanced Multi-Domain Dialogue State Tracker With Second-Order Slot Interactions. 265-276 - Hui Tian

, Yiqin Qiu
, Wojciech Mazurczyk
, Haizhou Li
, Zhenxing Qian
:
STFF-SM: Steganalysis Model Based on Spatial and Temporal Feature Fusion for Speech Streams. 277-289 - Gopendra Vikram Singh

, Mauajama Firdaus
, Asif Ekbal
, Pushpak Bhattacharyya:
EmoInt-Trans: A Multimodal Transformer for Identifying Emotions and Intents in Social Conversations. 290-300 - De Hu

, Huaiwen Zhang
, Feilong Bao, Rui Wang
:
Distributed Sampling Rate Offset Estimation Over Acoustic Sensor Networks Based on Asynchronous Network Newton Optimization. 301-312 - David Diaz-Guerra

, Antonio Miguel
, José Ramón Beltrán
:
Direction of Arrival Estimation of Sound Sources Using Icosahedral CNNs. 313-321 - Peiming Guo

, Shen Huang, Peijie Jiang
, Yueheng Sun
, Meishan Zhang
, Min Zhang:
Curriculum-Style Fine-Grained Adaption for Unsupervised Cross-Lingual Dependency Transfer. 322-332 - Naveen Kumar Desiraju

, Simon Doclo
, Markus Buck
, Tobias Wolff
:
Joint Online Estimation of Early and Late Residual Echo PSD for Residual Echo Suppression. 333-344 - Guangzhi Sun

, Chao Zhang
, Philip C. Woodland
:
Minimising Biasing Word Errors for Contextual ASR With the Tree-Constrained Pointer Generator. 345-354 - Jonah Casebeer

, Nicholas J. Bryan
, Paris Smaragdis:
Meta-AF: Meta-Learning for Adaptive Filters. 355-370 - Yingwen Fu

, Nankai Lin
, Boyu Chen, Ziyu Yang, Shengyi Jiang
:
Cross-Lingual Named Entity Recognition for Heterogenous Languages. 371-382 - Jun-You Wang

, Jyh-Shing Roger Jang:
Training a Singing Transcription Model Using Connectionist Temporal Classification Loss and Cross-Entropy Loss. 383-396 - Zhong-Qiu Wang

, Gordon Wichern
, Shinji Watanabe
, Jonathan Le Roux
:
STFT-Domain Neural Speech Enhancement With Very Low Algorithmic Latency. 397-410 - Yu Li

, Bojie Hu, Jian Liu
, Yufeng Chen, Jinan Xu
:
A Neighborhood Re-Ranking Model With Relation Constraint for Knowledge Graph Completion. 411-425 - Alessio Miaschi

, Dominique Brunato
, Felice Dell'Orletta
, Giulia Venturi
:
On Robustness and Sensitivity of a Neural Language Model: A Case Study on Italian L1 Learner Errors. 426-438 - Rong Xiao

, Yu Wan
, Baosong Yang
, Haibo Zhang
, Huajin Tang
, Derek F. Wong
, Boxing Chen
:
Towards Energy-Preserving Natural Language Understanding With Spiking Neural Networks. 439-447 - Juan Zhao

, Tianrui Zong, Yong Xiang
, Longxiang Gao
, Guang Hua
, Keshav Sood
, Yushu Zhang
:
SSVS-SSVD Based Desynchronization Attacks Resilient Watermarking Method for Stereo Signals. 448-461 - Qiquan Zhang

, Xinyuan Qian
, Zhaoheng Ni, Aaron Nicolson, Eliathamby Ambikairajah
, Haizhou Li:
A Time-Frequency Attention Module for Neural Speech Enhancement. 462-475 - Binhong Xie

, Yu Li
, Hongyan Zhao
, Lihu Pan
, Enhui Wang:
A Cross-Attention Fusion Based Graph Convolution Auto-Encoder for Open Relation Extraction. 476-485 - Qian-Bei Hong

, Chung-Hsien Wu
, Hsin-Min Wang
:
Generalization Ability Improvement of Speaker Representation and Anti-Interference for Speaker Verification. 486-499 - Xinglin Lyu

, Junhui Li
, Min Zhang, Chenchen Ding, Hideki Tanaka, Masao Utiyama
:
Refining History for Future-Aware Neural Machine Translation. 500-512 - Mou Wang

, Junqi Chen
, Xiao-Lei Zhang
, Susanto Rahardja
:
End-to-End Multi-Modal Speech Recognition on an Air and Bone Conducted Speech Corpus. 513-524 - Asier López-Zorrilla

, María Inés Torres
, Heriberto Cuayáhuitl:
Audio Embedding-Aware Dialogue Policy Learning. 525-538 - Xichen Shang

, Chuxin Chen, Zipeng Chen
, Qianli Ma
:
Modularized Mutuality Network for Emotion-Cause Pair Extraction. 539-549 - Xinyuan Qian

, Zhengdong Wang, Jiadong Wang
, Guohui Guan, Haizhou Li
:
Audio-Visual Cross-Attention Network for Robotic Speaker Tracking. 550-562 - Kristina Tesch

, Timo Gerkmann
:
Insights Into Deep Non-Linear Filters for Improved Multi-Channel Speech Enhancement. 563-575 - Thilo von Neumann

, Keisuke Kinoshita
, Christoph Böddeker
, Marc Delcroix
, Reinhold Haeb-Umbach
:
Segment-Less Continuous Speech Separation of Meetings: Training and Evaluation Criteria. 576-589 - Davide Albertini

, Alberto Bernardini
, Federico Borra, Fabio Antonacci, Augusto Sarti
:
Two-Stage Beamforming With Arbitrary Planar Arrays of Differential Microphone Array Units. 590-602 - Yi-Syuan Chen, Yun-Zhu Song, Hong-Han Shuai

:
SPEC: Summary Preference Decomposition for Low-Resource Abstractive Summarization. 603-618 - Yingying Xiao

, Shanmou Chen
, Qiangqiang Zhang
, Dongyuan Lin
, Minglin Shen, Junhui Qian
, Shiyuan Wang
:
Generalized Hyperbolic Tangent Based Random Fourier Conjugate Gradient Filter for Nonlinear Active Noise Control. 619-632 - Jun Qi

, Chao-Han Huck Yang
, Pin-Yu Chen
, Javier Tejedor
:
Exploiting Low-Rank Tensor-Train Deep Neural Networks Based on Riemannian Gradient Descent With Illustrations of Speech Processing. 633-642 - Bin Gu

, Wu Guo
, Jie Zhang
:
Memory Storable Network Based Feature Aggregation for Speaker Representation Learning. 643-655 - Takumi Abe

, Shoichi Koyama
, Natsuki Ueno
, Hiroshi Saruwatari
:
Amplitude Matching for Multizone Sound Field Control. 656-669 - Mahdi Barhoush

, Ahmed Hallawa
, Arne Peine, Lukas Martin, Anke Schmeink:
Localization-Driven Speech Enhancement in Noisy Multi-Speaker Hospital Environments Using Deep Learning and Meta Learning. 670-683 - Herman Kamper

:
Word Segmentation on Discovered Phone Units With Dynamic Programming and Self-Supervised Scoring. 684-694 - Changheng Li

, Jorge Martínez
, Richard Christian Hendriks
:
Joint Maximum Likelihood Estimation of Microphone Array Parameters for a Reverberant Single Source Scenario. 695-705 - Shota Horiguchi

, Shinji Watanabe
, Paola García
, Yuki Takashima
, Yohei Kawaguchi
:
Online Neural Diarization of Unlimited Numbers of Speakers Using Global and Local Attractors. 706-720 - Ling He

, Jia Fu, Yuanyuan Li
, Xi Xiong
, Jing Zhang
:
WNSA-Net: An Axial-Attention-Based Network for Schizophrenia Detection Using Wideband and Narrowband Spectrograms. 721-733 - Anusha Prakash

, Hema A. Murthy:
Exploring the Role of Language Families for Building Indic Speech Synthesisers. 734-747 - Mahdin Rohmatillah

, Jen-Tzung Chien
:
Hierarchical Reinforcement Learning With Guidance for Multi-Domain Dialogue Policy. 748-761 - Shahram Ghorbani

, John H. L. Hansen
:
Domain Expansion for End-to-End Speech Recognition: Applications for Accent/Dialect Speech. 762-774 - Weidong Chen

, Xiaofen Xing
, Xiangmin Xu
, Jianxin Pang
, Lan Du:
SpeechFormer++: A Hierarchical Efficient Framework for Paralinguistic Speech Processing. 775-788 - Nicki Holighaus

, Günther Koliander
, Clara Hollomey, Friedrich Pillichshammer
:
Grid-Based Decimation for Wavelet Transforms With Stably Invertible Implementation. 789-801 - Weiwei Lin

, Man-Wai Mak
:
Robust Speaker Verification Using Deep Weight Space Ensemble. 802-812 - Lin Zhang

, Xin Wang
, Erica Cooper
, Nicholas W. D. Evans
, Junichi Yamagishi
:
The PartialSpoof Database and Countermeasures for the Detection of Short Fake Speech Segments Embedded in an Utterance. 813-825 - Jie Mei

, Yufan Wang, Xinhui Tu, Ming Dong
, Tingting He
:
Incorporating BERT With Probability-Aware Gate for Spoken Language Understanding. 826-834 - Tsubasa Ochiai

, Marc Delcroix
, Tomohiro Nakatani, Shoko Araki
:
Mask-Based Neural Beamforming for Moving Speakers With Self-Attention-Based Tracking. 835-848 - Rongzhi Gu

, Shi-Xiong Zhang, Yuexian Zou
, Dong Yu
:
Towards Unified All-Neural Beamforming for Time and Frequency Domain Speech Separation. 849-862 - Naotake Masuda

, Daisuke Saito:
Improving Semi-Supervised Differentiable Synthesizer Sound Matching for Practical Applications. 863-875 - Erfan Loweimi

, Zhengjun Yue
, Peter Bell
, Steve Renals
, Zoran Cvetkovic
:
Multi-Stream Acoustic Modelling Using Raw Real and Imaginary Parts of the Fourier Transform. 876-890 - Bengt J. Borgström

:
A Generative Approach to Condition-Aware Score Calibration for Speaker Verification. 891-901 - Irene Martín-Morató

, Annamaria Mesaros
:
Strong Labeling of Sound Events Using Crowdsourced Weak Labels and Annotator Competence Estimation. 902-914 - Wenzhao Zhu, Lei Luo

, Jinwei Sun
, Mads Græsbøll Christensen
:
A New Virtual Tracking Sub-Algorithm Based Hybrid Active Control System for Narrowband Noise With Impulsive Interference. 915-926 - Thomas Deppisch

, Sebastià Vicenc Amengual Garí
, Paul Calamia
, Jens Ahrens
:
Direct and Residual Subspace Decomposition of Spatial Room Impulse Responses. 927-942 - Eloi Moliner

, Vesa Välimäki
:
BEHM-GAN: Bandwidth Extension of Historical Music Using Generative Adversarial Networks. 943-956 - Martin Jälmby

, Filip Elvander
, Toon van Waterschoot
:
Low-Rank Room Impulse Response Estimation. 957-969 - Hong Liu, Yucheng Cai, Zhenru Lin

, Zhijian Ou
, Yi Huang
, Junlan Feng:
Variational Latent-State GPT for Semi-Supervised Task-Oriented Dialog Systems. 970-984 - De Hu

, Qintuya Si
, Rui Liu
, Feilong Bao:
Distributed Sensor Selection for Speech Enhancement With Acoustic Sensor Networks. 985-999 - Yingke Zhu

, Brian Mak
:
Bayesian Self-Attentive Speaker Embeddings for Text-Independent Speaker Verification. 1000-1012 - Yuying Li

, Yuchen Liu
, Donald S. Williamson
:
A Composite T60 Regression and Classification Approach for Speech Dereverberation. 1013-1023 - Hanyi Zhang

, Longbiao Wang
, Kong Aik Lee
, Meng Liu, Jianwu Dang
, Helen Meng:
Meta-Generalization for Domain-Invariant Speaker Verification. 1024-1036 - Shutong Niu

, Jun Du
, Lei Sun
, Yu Hu, Chin-Hui Lee
:
QDM-SSD: Quality-Aware Dynamic Masking for Separation-Based Speaker Diarization. 1037-1049 - Boyang Lyu

, Chunxiao Fan, Yue Ming
, Panzi Zhao, Nannan Hu
:
En-HACN: Enhancing Hybrid Architecture With Fast Attention and Capsule Network for End-to-end Speech Recognition. 1050-1062 - Yang Liu

, Haoqin Sun
, Wenbo Guan
, Yuqi Xia
, Yongwei Li
, Masashi Unoki
, Zhen Zhao
:
A Discriminative Feature Representation Method Based on Cascaded Attention Network With Adversarial Strategy for Speech Emotion Recognition. 1063-1074 - Hao Zhang

, Nianwen Si, Yaqi Chen
, Wenlin Zhang, Xukui Yang
, Dan Qu
, Weiqiang Zhang
:
Improving Speech Translation by Cross-Modal Multi-Grained Contrastive Learning. 1075-1086 - Wei-Cheng Lin

, Carlos Busso
:
Sequential Modeling by Leveraging Non-Uniform Distribution of Speech Emotion. 1087-1099 - Achyut Mani Tripathi

, Om Jee Pandey
:
Divide and Distill: New Outlooks on Knowledge Distillation for Environmental Sound Classification. 1100-1113 - Hao Zhang

, Ashutosh Pandey
, DeLiang Wang
:
Low-Latency Active Noise Control Using Attentive Recurrent Network. 1114-1123 - Avital Bross, Sharon Gannot

:
Training-Based Multiple Source Tracking Using Manifold-Learning and Recursive Expectation-Maximization. 1124-1140 - Guimin Hu

, Yi Zhao
, Guangming Lu
:
Emotion Prediction Oriented Method With Multiple Supervisions for Emotion-Cause Pair Extraction. 1141-1152 - Reza Mohsenipour

, Daniel Massicotte
, Wei-Ping Zhu
:
PI Control of Loudspeakers Based on Linear Fractional Order Model. 1153-1162 - Tim Lübeck

, Johannes M. Arend
, Christoph Pörschmann
:
Spatial Upsampling of Sparse Spherical Microphone Array Signals. 1163-1174 - Jiajun Deng

, Xurong Xie
, Tianzi Wang, Mingyu Cui, Boyang Xue
, Zengrui Jin
, Guinan Li
, Shujie Hu
, Xunying Liu
:
Confidence Score Based Speaker Adaptation of Conformer Speech Recognition Systems. 1175-1190 - Hongsheng Zhang

, Jizhang Gan
, Ting Liu
, Kui Huang, Hong Yang:
Coefficients-Switched Normalized Least-Mean- Squares Adaption in Echo Canceler of Sparse-Echo-Path. 1191-1199 - Eric Guizzo

, Tillman Weyde
, Simone Scardapane
, Danilo Comminiello
:
Learning Speech Emotion Representations in the Quaternion Domain. 1200-1212 - Jiaqi Bai

, Ze Yang, Jian Yang
, Hongcheng Guo
, Zhoujun Li
:
KINet: Incorporating Relevant Facts Into Knowledge-Grounded Dialog Generation. 1213-1222 - Haiquan Zhao

, Yuan Gao, Yingying Zhu:
Robust Subband Adaptive Filter Algorithms-Based Mixture Correntropy and Application to Acoustic Echo Cancellation. 1223-1233 - Chen Zhang

, Luis Fernando D'Haro
, Qiquan Zhang
, Thomas Friedrichs, Haizhou Li
:
PoE: A Panel of Experts for Generalized Automatic Dialogue Assessment. 1234-1250 - Qing Wang

, Jun Du
, Huaxin Wu, Jia Pan, Feng Ma, Chin-Hui Lee
:
A Four-Stage Data Augmentation Approach to ResNet-Conformer Based Acoustic Modeling for Sound Event Localization and Detection. 1251-1264 - Yingwen Fu

, Nankai Lin
, Xiaohui Yu, Shengyi Jiang
:
Self-Training With Double Selectors for Low-Resource Named Entity Recognition. 1265-1275 - Kilian Schulze-Forster

, Gaël Richard
, Liam Kelley, Clement S. J. Doire
, Roland Badeau
:
Unsupervised Music Source Separation Using Differentiable Parametric Source Models. 1276-1289 - Yinggang Liu

, Hong Fu
, Ying Wei
, Hanbing Zhang
:
Sound Event Classification Based on Frequency-Energy Feature Representation and Two-Stage Data Dimension Reduction. 1290-1304 - Ege Erdem

, Zoran Cvetkovic
, Hüseyin Hacihabiboglu
:
3D Perceptual Soundfield Reconstruction via Virtual Microphone Synthesis. 1305-1317 - Dong-Yuan Shi

, Woon-Seng Gan
, Bhan Lam
, Xiaoyi Shen
:
A Frequency-Domain Output-Constrained Active Noise Control Algorithm Based on an Intuitive Circulant Convolutional Penalty Factor. 1318-1332 - Muhammed Zahid Ozturk

, Chenshu Wu, Beibei Wang, Min Wu
, K. J. Ray Liu:
RadioSES: mmWave-Based Audioradio Speech Enhancement and Separation System. 1333-1347 - Jianwei Zhang

, Julie Liss, Suren Jayasuriya
, Visar Berisha:
Robust Vocal Quality Feature Embeddings for Dysphonic Voice Detection. 1348-1359 - Ashutosh Pandey

, DeLiang Wang
:
Attentive Training: A New Training Framework for Speech Enhancement. 1360-1370 - Hirofumi Inaguma

, Tatsuya Kawahara
:
Alignment Knowledge Distillation for Online Streaming Attention-Based Speech Recognition. 1371-1385 - Mittapalle Kiran Reddy

, Paavo Alku
:
Exemplar-Based Sparse Representations for Detection of Parkinson's Disease From Speech. 1386-1396 - Shunsuke Kita

, Yoshinobu Kajikawa
:
Sound Source Localization Inside a Structure Under Semi-Supervised Conditions. 1397-1408 - Guowei Wu

, Shipei Liu
, Xiaoya Fan
:
The Power of Fragmentation: A Hierarchical Transformer Model for Structural Segmentation in Symbolic Music Generation. 1409-1420 - Xueqin Luo

, Gongping Huang
, Jilu Jin
, Jingdong Chen
, Jacob Benesty
, Wen Zhang
, Mengyao Zhu
, Chunjian Li:
Design of Maximum Directivity Beamformers With Linear Acoustic Vector Sensor Arrays. 1421-1435 - Ruchao Fan

, Wei Chu, Peng Chang, Abeer Alwan:
A CTC Alignment-Based Non-Autoregressive Transformer for End-to-End Automatic Speech Recognition. 1436-1448 - Tianyou Li, Siyuan Lian, Sipei Zhao

, Jing Lu
, Ian S. Burnett
:
Distributed Active Noise Control Based on an Augmented Diffusion FxLMS Algorithm. 1449-1463 - Jiayuan Xie

, Wenhao Fang, Qingbao Huang
, Yi Cai
, Tao Wang
:
Enhancing Paraphrase Question Generation With Prior Knowledge. 1464-1475 - Chen Chen

, Hansheng Hong, Jie Guo
, Bin Song
:
Inter-Intra Modal Representation Augmentation With Trimodal Collaborative Disentanglement Network for Multimodal Sentiment Analysis. 1476-1488 - Jian Yang

, Yuwei Yin
, Liqun Yang
, Shuming Ma, Haoyang Huang, Dongdong Zhang
, Furu Wei, Zhoujun Li
:
GTrans: Grouping and Fusing Transformer Layers for Neural Machine Translation. 1489-1498 - Xin Wu

, Yi Cai
, Zetao Lian, Ho-fung Leung
, Tao Wang
:
Generating Natural Language From Logic Expressions With Structural Representation. 1499-1510 - Yi Li

, Yang Sun
, Wenwu Wang
, Syed Mohsen Naqvi
:
U-Shaped Transformer With Frequency-Band Aware Attention for Speech Enhancement. 1511-1521 - Christian Antoñanzas

, Miguel Ferrer
, Maria de Diego
, Alberto González
:
Remote Microphone Technique for Active Noise Control Over Distributed Networks. 1522-1535 - Yi Zhu

, Abhishek Tiwari, João Monteiro, Shruti Rajendra Kshirsagar, Tiago H. Falk
:
COVID-19 Detection via Fusion of Modulation Spectrum and Linear Prediction Speech Features. 1536-1549 - Jijie Li

, Kai Shuang
, Jinyu Guo
, Zengyi Shi, Hongman Wang:
Enhancing Semantic Relation Classification With Shortest Dependency Path Reasoning. 1550-1560 - Mao-Kui He

, Jun Du
, Qing-Feng Liu, Chin-Hui Lee
:
ANSD-MA-MSE: Adaptive Neural Speaker Diarization Using Memory-Aware Multi-Speaker Embedding. 1561-1573 - Longting Xu

, Jichen Yang
, Chang Huai You
, Xinyuan Qian
, Daiyu Huang
:
Device Features Based on Linear Transformation With Parallel Training Data for Replay Speech Detection. 1574-1586 - Huajian Fang

, Dennis Becker
, Stefan Wermter
, Timo Gerkmann
:
Integrating Uncertainty Into Neural Network-Based Speech Enhancement. 1587-1600 - Libo Qin

, Xiao Xu
, Lehan Wang, Yue Zhang
, Wanxiang Che
:
Modularized Pre-Training for End-to-End Task-Oriented Dialogue. 1601-1610 - Hanlei Zhang

, Hua Xu
, Shaojie Zhao
, Qianrui Zhou
:
Learning Discriminative Representations and Decision Boundaries for Open Intent Detection. 1611-1623 - Guangsheng Bao

, Yue Zhang
:
A General Contextualized Rewriting Framework for Text Summarization. 1624-1635 - Christoph Kirsch

, Stephan Dieter Ewert
:
A Universal Filter Approximation of Edge Diffraction for Geometrical Acoustics. 1636-1651 - Peyman Goli

, Steven van de Par:
Deep Learning-Based Speech Specific Source Localization by Using Binaural and Monaural Microphone Arrays in Hearing Aids. 1652-1666 - Nguyen Binh Thien

, Yukoh Wakabayashi
, Kenta Iwai
, Takanobu Nishiura
:
Inter-Frequency Phase Difference for Phase Reconstruction Using Deep Neural Networks and Maximum Likelihood. 1667-1680 - Srikanth Raj Chetupalli

, Emanuël A. P. Habets
:
Speaker Counting and Separation From Single-Channel Noisy Mixtures. 1681-1692 - Guangyan Zhang

, Ying Qin
, Wenjie Zhang, Jialun Wu, Mei Li, Yutao Gai
, Feijun Jiang, Tan Lee
:
iEmoTTS: Toward Robust Cross-Speaker Emotion Transfer and Control for Speech Synthesis Based on Disentanglement Between Prosody and Timbre. 1693-1705 - Ruijie Tao

, Kong Aik Lee
, Rohan Kumar Das
, Ville Hautamäki, Haizhou Li:
Self-Supervised Training of Speaker Encoder With Multi-Modal Diverse Positive Pairs. 1706-1719 - Dongchao Yang

, Jianwei Yu
, Helin Wang
, Wen Wang
, Chao Weng, Yuexian Zou
, Dong Yu
:
Diffsound: Discrete Diffusion Model for Text-to-Sound Generation. 1720-1733 - Paul Konstantin Krug

, Peter Birkholz
, Branislav Gerazov
, Daniel Rudolph van Niekerk
, Anqi Xu
, Yi Xu
:
Artificial Vocal Learning Guided by Phoneme Recognition and Visual Information. 1734-1744 - Qian-Bei Hong

, Chung-Hsien Wu
, Hsin-Min Wang
:
Decomposition and Reorganization of Phonetic Information for Speaker Embedding Learning. 1745-1757 - Wenbin Jiang

, Kai Yu
:
Speech Enhancement With Integration of Neural Homomorphic Synthesis and Spectral Masking. 1758-1770 - Shu'ang Li

, Xuming Hu
, Li Lin, Aiwei Liu, Lijie Wen
, Philip S. Yu
:
A Multi-Level Supervised Contrastive Learning Framework for Low-Resource Natural Language Inference. 1771-1783 - Xiaoqing Zheng

:
Building Conventional "Experts" With a Dialogue Logic Programming Language. 1784-1796 - Haitao Lin

, Junnan Zhu, Lu Xiang
, Feifei Zhai
, Yu Zhou
, Jiajun Zhang
, Chengqing Zong
:
Topic-Oriented Dialogue Summarization. 1797-1810 - Haohan Guo

, Fenglong Xie, Xixin Wu
, Frank K. Soong, Helen Meng:
MSMC-TTS: Multi-Stage Multi-Codebook VQ-VAE Based Neural TTS. 1811-1824 - Bei Liu

, Zhengyang Chen
, Yanmin Qian
:
Depth-First Neural Architecture With Attentive Feature Fusion for Efficient Speaker Verification. 1825-1838 - Ria Ghosh

, John H. L. Hansen
:
Bilateral Cochlear Implant Processing of Coding Strategies With CCi-MOBILE, an Open-Source Research Platform. 1839-1850 - Aolong Zhou

, Wen Zhang
, Guojun Xu
, Xiaoyong Li
, Kefeng Deng
, Junqiang Song
:
DBSA-Net: Dual Branch Self-Attention Network for Underwater Acoustic Signal Denoising. 1851-1865 - Weiwei Lin

, Man-Wai Mak
:
Model-Agnostic Meta-Learning for Fast Text-Dependent Speaker Embedding Adaptation. 1866-1876 - Andrea Galassi

, Marco Lippi
, Paolo Torroni
:
Multi-Task Attentive Residual Networks for Argument Mining. 1877-1892 - Yi Luo

, Jianwei Yu
:
Music Source Separation With Band-Split RNN. 1893-1901 - Keisuke Matsubara

, Takuma Okamoto
, Ryoichi Takashima
, Tetsuya Takiguchi
, Tomoki Toda
, Hisashi Kawai:
Harmonic-Net: Fundamental Frequency and Speech Rate Controllable Fast Neural Vocoder. 1902-1915 - Yi Zhou

, Zhizheng Wu
, Xiaohai Tian
, Haizhou Li
:
Optimization of Cross-Lingual Voice Conversion With Linguistics Losses to Reduce Foreign Accents. 1916-1926 - Qiu-Shi Zhu

, Jie Zhang
, Ziqiang Zhang
, Li-Rong Dai
:
A Joint Speech Enhancement and Self-Supervised Representation Learning Framework for Noise-Robust Speech Recognition. 1927-1939 - Siqi Sun

, Korin Richmond
, Hao Tang
:
Improving Seq2Seq TTS Frontends With Transcribed Speech Audio. 1940-1952 - Shih-Lun Wu

, Yi-Hsuan Yang
:
MuseMorphose: Full-Song and Fine-Grained Piano Music Style Transfer With One Transformer VAE. 1953-1967 - Xiaoxue Gao

, Chitralekha Gupta
, Haizhou Li:
PoLyScriber: Integrated Fine-Tuning of Extractor and Lyrics Transcriber for Polyphonic Music. 1968-1981 - Zhicheng Lian

, Haonan Cheng
, Jiawan Zhang
:
PQG-A2SA: Performance Quantification Guided Audio-to-Score Alignment for Orchestral Music. 1982-1992 - Jingen Ni

, Ningning Zhang
, Haofen Li
:
Sparsity-Promoting Affine Projection Algorithm With Periodically-Updated Gain Matrix and Its Performance Analysis. 1993-2003 - Orchisama Das

, Sebastian J. Schlecht
, Enzo De Sena
:
Grouped Feedback Delay Networks With Frequency-Dependent Coupling. 2004-2015 - Xudong Zhao

, Gongping Huang
, Jingdong Chen
, Jacob Benesty
:
Design of 2D and 3D Differential Microphone Arrays With a Multistage Framework. 2016-2031 - Jia-Hao Hsu

, Jeremy Chang
, Min-Hsueh Kuo, Chung-Hsien Wu
:
Empathetic Response Generation Based on Plug-and-Play Mechanism With Empathy Perturbation. 2032-2042 - Aditya Dutt

, Paul D. Gader:
Wavelet Multiresolution Analysis Based Speech Emotion Recognition System Using 1D CNN LSTM Networks. 2043-2054 - Arturo Morales

, Juan I. Yuz
, Juan P. Cortés
, Javier G. Fontanet
, Matías Zañartu
:
Glottal Airflow Estimation Using Neck Surface Acceleration and Low-Order Kalman Smoothing. 2055-2066 - Yuya Hosoda

, Arata Kawamura
, Youji Iiguni:
Complex-Domain Pitch Estimation Algorithm for Narrowband Speech Signals. 2067-2078 - Zhidong Liu

, Junhui Li
, Muhua Zhu
:
Alleviating Exposure Bias for Neural Machine Translation via Contextual Augmentation and Self Distillation. 2079-2089 - Hanan Beit-On

, Tom Shlomo
, Boaz Rafaely
:
Weighted Frequency Smoothing for Enhanced Speaker Localization. 2090-2099 - Shan Gao, Xihong Wu, Tianshu Qu

:
A Physical Model-Based Self-Supervised Learning Method for Signal Enhancement Under Reverberant Environment. 2100-2110 - Xue Jiang

, Xiulian Peng
, Huaying Xue
, Yuan Zhang
, Yan Lu
:
Latent-Domain Predictive Neural Speech Coding. 2111-2123 - Shumin Deng

, Jiacheng Yang, Hongbin Ye, Chuanqi Tan
, Mosha Chen
, Songfang Huang
, Fei Huang
, Huajun Chen, Ningyu Zhang
:
LOGEN: Few-Shot Logical Knowledge-Conditioned Text Generation With Self-Training. 2124-2133 - Yuanzhi Liu

, Min He
, Qingqing Yang, Gwanggil Jeon
:
An Unsupervised Framework With Attention Mechanism and Embedding Perturbed Encoder for Non-Parallel Text Sentiment Style Transfer. 2134-2144 - Yang Ai

, Zhen-Hua Ling
:
APNet: An All-Frame-Level Neural Vocoder Incorporating Direct Prediction of Amplitude and Phase Spectra. 2145-2157 - Fei Zhao

, Zhen Wu
, Liang He
, Xin-Yu Dai
:
Label-Correction Capsule Network for Hierarchical Text Classification. 2158-2168 - Cem Subakan

, Mirco Ravanelli
, Samuele Cornell
, François Grondin
, Mirko Bronzi:
Exploring Self-Attention Mechanisms for Speech Separation. 2169-2180 - Chenggang Zhang

, Jinjiang Liu, Hao Li
, Xueliang Zhang
:
Neural Multi-Channel and Multi-Microphone Acoustic Echo Cancellation. 2181-2192 - Zheng Liu

, Xin Kang
, Fuji Ren
:
Dual-TBNet: Improving the Robustness of Speech Features via Dual-Transformer-BiLSTM for Speech Emotion Recognition. 2193-2203 - Sandro Cumani

, Salvatore Sarni
:
The Distributions of Uncalibrated Speaker Verification Scores: A Generative Model for Domain Mismatch and Trial-Dependent Calibration. 2204-2219 - Xi Ai

, Bin Fang
:
Cross-Modal Language Modeling in Multi-Motion-Informed Context for Lip Reading. 2220-2232 - Andreas Jonas Fuglsig

, Jesper Jensen
, Zheng-Hua Tan
, Lars Søndergaard Bertelsen
, Jens Christian Lindof, Jan Østergaard
:
Minimum Processing Near-End Listening Enhancement. 2233-2245 - Zhiwen Xie

, Runjie Zhu, Jin Liu
, Guangyou Zhou
, Jimmy Xiangji Huang
:
TARGAT: A Time-Aware Relational Graph Attention Model for Temporal Knowledge Graph Embedding. 2246-2258 - Cuilian Zhang

, Derek F. Wong
, Eddy Sio Kei Lei
, Runzhe Zhan
, Lidia S. Chao
:
Obscurity-Quantified Curriculum Learning for Machine Translation Evaluation. 2259-2271 - Yaxin Liu

, Yan Zhou
, Ziming Li, Junlin Wang, Wei Zhou
, Songlin Hu
:
HIM: An End-to-End Hierarchical Interaction Model for Aspect Sentiment Triplet Extraction. 2272-2285 - Yukoh Wakabayashi

, Kouei Yamaoka
, Nobutaka Ono
:
Sound Field Interpolation for Rotation-Invariant Multichannel Array Signal Processing. 2286-2298 - Jesper Kjær Nielsen

, Mads Græsbøll Christensen
, Jesper Bünsow Boldt
:
An Analysis of Traditional Noise Power Spectral Density Estimators Based on the Gaussian Stochastic Volatility Model. 2299-2313 - Karen Gissell Rosero Jacome

, Felipe Leonel Grijalva
, Bruno Sanches Masiero
:
Sound Events Localization and Detection Using Bio-Inspired Gammatone Filters and Temporal Convolutional Neural Networks. 2314-2324 - Lin Yuan

, Guoheng Huang
, Fenghuan Li
, Xiaochen Yuan
, Chi-Man Pun
, Guo Zhong:
RBA-GCN: Relational Bilevel Aggregation Graph Convolutional Network for Emotion Recognition. 2325-2337 - Samuel Poirot

, Stefan Bilbao
, Mitsuko Aramaki
, Sølvi Ystad
, Richard Kronland-Martinet
:
A Perceptually Evaluated Signal Model: Collisions Between a Vibrating Object and an Obstacle. 2338-2350 - Julius Richter

, Simon Welker
, Jean-Marie Lemercier
, Bunlong Lay
, Timo Gerkmann
:
Speech Enhancement and Dereverberation With Diffusion-Based Generative Models. 2351-2364 - Siarhei Y. Barysenka

, Vasili I. Vorobiov
:
SNR-Based Inter-Component Phase Estimation Using Bi-Phase Prior Statistics for Single-Channel Speech Enhancement. 2365-2381 - Jiandian Zeng

, Jiantao Zhou
, Caishi Huang:
Exploring Semantic Relations for Social Media Sentiment Analysis. 2382-2394 - Fotios Drakopoulos

, Sarah Verhulst:
A Neural-Network Framework for the Design of Individualised Hearing-Loss Compensation. 2395-2409 - Xinbei Ma

, Zhuosheng Zhang
, Hai Zhao
:
Enhanced Speaker-Aware Multi-Party Multi-Turn Dialogue Comprehension. 2410-2423 - Tianrui Wang

, Weibin Zhu
, Yingying Gao, Shilei Zhang, Junlan Feng:
Harmonic Attention for Monaural Speech Enhancement. 2424-2436 - Lei Lei

, Guoshun Yuan, Hongjiang Yu, Dewei Kong, Yuefeng He:
Multilingual Customized Keyword Spotting Using Similar-Pair Contrastive Learning. 2437-2447 - Shaokai Li

, Peng Song
, Wenming Zheng
:
Multi-Source Discriminant Subspace Alignment for Cross-Domain Speech Emotion Recognition. 2448-2460 - Yeqing Ren

, Haipeng Peng
, Lixiang Li
, Xiaopeng Xue, Yang Lan, Yixian Yang
:
Generalized Voice Spoofing Detection via Integral Knowledge Amalgamation. 2461-2475 - Xing Chen

, Jie Wang, Xiao-Lei Zhang
, Weiqiang Zhang
, Kunde Yang
:
LMD: A Learnable Mask Network to Detect Adversarial Examples for Speaker Verification. 2476-2490 - Benjamin Yen

, Yameizhen Li, Yusuke Hioka
:
Rotor Noise-Aware Noise Covariance Matrix Estimation for Unmanned Aerial Vehicle Audition. 2491-2506 - Xuechen Liu

, Xin Wang
, Md. Sahidullah
, Jose Patino
, Héctor Delgado
, Tomi Kinnunen
, Massimiliano Todisco, Junichi Yamagishi
, Nicholas W. D. Evans
, Andreas Nautsch
, Kong Aik Lee
:
ASVspoof 2021: Towards Spoofed and Deepfake Speech Detection in the Wild. 2507-2522 - Zalán Borsos

, Raphaël Marinier, Damien Vincent, Eugene Kharitonov, Olivier Pietquin, Matthew Sharifi
, Dominik Roblek
, Olivier Teboul, David Grangier
, Marco Tagliasacchi
, Neil Zeghidour
:
AudioLM: A Language Modeling Approach to Audio Generation. 2523-2533 - Xingfeng Li

, Xiaohan Shi
, Desheng Hu, Yongwei Li
, Qingchen Zhang
, Zhengxia Wang
, Masashi Unoki
, Masato Akagi
:
Music Theory-Inspired Acoustic Representation for Speech Emotion Recognition. 2534-2547 - Jiachen Lian, Chunlei Zhang

, Gopala Krishna Anumanchipalli, Dong Yu
:
Unsupervised TTS Acoustic Modeling for TTS With Conditional Disentangled Sequential VAE. 2548-2557 - Arsalan Malik

, Nipun Agarwal
, Harshavardhan Settibhaktini
, Ananthakrishna Chintanpalli
:
Predicting Level-Dependent Changes in Concurrent Vowel Scores Using the 2D-CNN Models. 2558-2566 - Michael Krause

, Meinard Müller
:
Hierarchical Classification for Instrument Activity Detection in Orchestral Music Recordings. 2567-2578 - Julie Meyer

, Sebastian Prepelita
, Ali Khajeh-Saeed
, Michael Smirnov, Pablo Hoffmann:
Verification on Head-Related Transfer Functions of a Snowman Model Simulated Using the Finite-Difference Time-Domain Method. 2579-2591 - Darius Petermann

, Gordon Wichern
, Aswin Shanmugam Subramanian
, Zhong-Qiu Wang
, Jonathan Le Roux
:
Tackling the Cocktail Fork Problem for Separation and Transcription of Real-World Soundtracks. 2592-2605 - Hailong Cao

, Liguo Li
, Conghui Zhu, Muyun Yang
, Tiejun Zhao:
Dual Word Embedding for Robust Unsupervised Bilingual Lexicon Induction. 2606-2615 - Lin Xiao, Pengyu Xu

, Mingyang Song
, Huafeng Liu
, Liping Jing
, Xiangliang Zhang
:
Triple Alliance Prototype Orthotist Network for Long-Tailed Multi-Label Text Classification. 2616-2628 - Juhua Liu

, Qihuang Zhong
, Liang Ding
, Hua Jin
, Bo Du
, Dacheng Tao
:
Unified Instance and Knowledge Alignment Pretraining for Aspect-Based Sentiment Analysis. 2629-2642 - Yiming Zhang

, Hong Yu
, Ruoyi Du, Zheng-Hua Tan
, Wenwu Wang
, Zhanyu Ma
, Yuan Dong
:
ACTUAL: Audio Captioning With Caption Feature Space Regularization. 2643-2657 - Jakob Abeßer

, Sascha Grollmisch
, Meinard Müller
:
How Robust are Audio Embeddings for Polyphonic Sound Event Tagging? 2658-2667 - Wei Xia

, John H. L. Hansen
:
Attention and DCT Based Global Context Modeling for Text-Independent Speaker Recognition. 2668-2679 - Takuya Hasumi, Tomohiko Nakamura

, Norihiro Takamune
, Hiroshi Saruwatari
, Daichi Kitamura
, Yu Takahashi
, Kazunobu Kondo:
PoP-IDLMA: Product-of-Prior Independent Deeply Learned Matrix Analysis for Multichannel Music Source Separation. 2680-2694 - Ben Liu

, Jun Wang
, Guanyuan Yu
, Shaolei Chen:
CUPVC: A Constraint-Based Unsupervised Prosody Transfer for Improving Telephone Banking Services. 2695-2706 - Guinan Li

, Jiajun Deng
, Mengzhe Geng
, Zengrui Jin
, Tianzi Wang, Shujie Hu
, Mingyu Cui, Helen Meng, Xunying Liu
:
Audio-Visual End-to-End Multi-Channel Speech Separation, Dereverberation and Recognition. 2707-2723 - Jean-Marie Lemercier

, Julius Richter
, Simon Welker
, Timo Gerkmann
:
StoRM: A Diffusion-Based Stochastic Regeneration Model for Speech Enhancement and Dereverberation. 2724-2737 - Yen-Ju Lu

, Chia-Yu Chang, Cheng Yu
, Ching-Feng Liu, Jeih-weih Hung
, Shinji Watanabe
, Yu Tsao
:
Improving Speech Enhancement Performance by Leveraging Contextual Broad Phonetic Class Information. 2738-2750 - Sungjae Kim

, Yewon Kim, Jewoo Jun, Injung Kim
:
MuSE-SVS: Multi-Singer Emotional Singing Voice Synthesizer That Controls Emotional Intensity. 2751-2764 - Xinxin Su

, Zhen Huang
, Yunxiang Zhao
, Yifan Chen, Yong Dou, Hengyue Pan:
Recent Trends in Deep Learning Based Textual Emotion Cause Extraction. 2765-2786 - Junyu Lu

, Hongfei Lin
, Xiaokun Zhang
, Zhaoqing Li, Tongyue Zhang, Linlin Zong
, Fenglong Ma
, Bo Xu
:
Hate Speech Detection via Dual Contrastive Learning. 2787-2795 - Diego Marques do Carmo, Ricardo Augusto Borsoi

, Márcio Holsbach Costa
:
Closed-Form Solution to the Multichannel Wiener Filter With Interaural Level Difference Preservation. 2796-2811 - Ya-Jie Zhang

, Chao Zhang
, Wei Song, Zhengchen Zhang, Youzheng Wu, Xiaodong He
:
Prosody Modelling With Pre-Trained Cross-Utterance Representations for Improved Speech Synthesis. 2812-2823 - Ching-Yu Chiu

, Meinard Müller
, Matthew E. P. Davies
, Alvin Wen-Yu Su, Yi-Hsuan Yang
:
Local Periodicity-Based Beat Tracking for Expressive Classical Piano Music. 2824-2835 - Feng Chen

, Ke Ma, Yapeng Mao, Desen Yang, Yi Zhang
, Jie Shi, Shiqi Mo
, Chenyang Gui, Song Li
:
A Novel Method to Design Steerable Differential Beamformer Using Linear Acoustics Vector Sensor Array. 2836-2849 - Tianyu Huang

, Weisheng Dong
, Fangfang Wu
, Xin Li
, Guangming Shi
:
Uncertainty-Driven Knowledge Distillation for Language Model Compression. 2850-2858 - Roberto Andrés Vasco Carofilis

, Enrique Alegre
, Eduardo Fidalgo
, Laura Fernández-Robles
:
Improvement of Accent Classification Models Through Grad-Transfer From Spectrograms and Gradient-Weighted Class Activation Mapping. 2859-2871 - Jacob Hollebon

, Filippo Maria Fazi
:
Higher-Order Stereophony. 2872-2885 - Jeremy Heng Meng Wong

, Huayun Zhang
, Nancy F. Chen
:
Modelling Inter-Rater Uncertainty in Spoken Language Assessment. 2886-2898 - Qinghua Zheng

, Yuefei Wu
, Guangtao Wang
, Yanping Chen
, Wei Wu
, Zai Zhang
, Bin Shi
, Bo Dong
:
Exploring Interactive and Contrastive Relations for Nested Named Entity Recognition. 2899-2909 - Dongyuan Shi

, Woon-Seng Gan
, Bhan Lam
, Zhengding Luo
, Xiaoyi Shen
:
Transferable Latent of CNN-Based Selective Fixed-Filter Active Noise Control. 2910-2921 - Dorian Desblancs

, Vincent Lostanlen
, Romain Hennequin
:
Zero-Note Samba: Self-Supervised Beat Tracking. 2922-2934 - Nankai Lin

, Yingwen Fu
, Xiaotian Lin
, Dong Zhou
, Aimin Yang
, Shengyi Jiang
:
CL-XABSA: Contrastive Learning for Cross-Lingual Aspect-Based Sentiment Analysis. 2935-2946 - Hanmeng Liu

, Jian Liu
, Leyang Cui
, Zhiyang Teng
, Nan Duan
, Ming Zhou
, Yue Zhang
:
LogiQA 2.0 - An Improved Dataset for Logical Reasoning in Natural Language Understanding. 2947-2962 - Jiangyan Yi

, Jianhua Tao
, Ruibo Fu
, Tao Wang
, Chu Yuan Zhang
, Chenglong Wang
:
Adversarial Multi-Task Learning for Mandarin Prosodic Boundary Prediction With Multi-Modal Embeddings. 2963-2973 - Ji Won Yoon

, Hyung Yong Kim
, Hyeonseung Lee
, Sunghwan Ahn
, Nam Soo Kim
:
Oracle Teacher: Leveraging Target Information for Better Knowledge Distillation of CTC Models. 2974-2987 - Sufeng Duan

, Hai Zhao
, Dongdong Zhang
:
Syntax-Aware Data Augmentation for Neural Machine Translation. 2988-2999 - Tongzheng Liu

, Zhihua Lu
, João Paulo C. L. da Costa
, Tai Fei
:
A Hybrid Reverberation Model and Its Application to Joint Speech Dereverberation and Separation. 3000-3014 - Junjun Guo

, Junjie Ye
, Yan Xiang
, Zhengtao Yu
:
Layer-Level Progressive Transformer With Modality Difference Awareness for Multi-Modal Neural Machine Translation. 3015-3026 - Qian Tao

, Zhihao Xiong, Bocheng Han
, Xiaoyang Fan
, Lusi Li
:
A Novel Unsupervised Approach for Cross-Lingual Word Alignment in Low Isomorphic Embedding Spaces. 3027-3041 - Jilu Jin

, Jacob Benesty
, Jingdong Chen
, Gongping Huang
:
Differential Beamforming From a Geometric Perspective. 3042-3054 - Alberto Palomo-Alonso

, David Casillas-Pérez
, Silvia Jiménez-Fernández
, José Antonio Portilla-Figueras
, Sancho Salcedo-Sanz
:
A Flexible Architecture Using Temporal, Spatial and Semantic Correlation-Based Algorithms for Story Segmentation of Broadcast News. 3055-3069 - Bolaji Yusuf

, Jan Cernocký
, Murat Saraçlar
:
End-to-End Open Vocabulary Keyword Search With Multilingual Neural Representations. 3070-3080 - Adrian Herzog

, Srikanth Raj Chetupalli
, Emanuël A. P. Habets
:
AmbiSep: Joint Ambisonic-to-Ambisonic Speech Separation and Noise Reduction. 3081-3094 - Po-Chun Hsu

, Da-Rong Liu
, Andy T. Liu
, Hung-yi Lee
:
Parallel Synthesis for Autoregressive Speech Generation. 3095-3111 - Siddharth Dalmia

, Dmytro Okhonko, Mike Lewis, Sergey Edunov, Shinji Watanabe
, Florian Metze
, Luke Zettlemoyer, Abdelrahman Mohamed:
LegoNN: Building Modular Encoder-Decoder Models. 3112-3126 - Tom Gajecki

, Waldo Nogueira
:
Deep Latent Fusion Layers for Binaural Speech Enhancement. 3127-3138 - Huawen Feng

, Zhenxi Lin
, Qianli Ma
:
Perturbation-Based Self-Supervised Attention for Attention Bias in Text Classification. 3139-3151 - Jiaxin Zhong

, Tao Zhuang
, Mengtong Li
, Ray Kirby
, Mahmoud Karimi
, Jing Lu
, Dong Zhang
:
Sidelobe Suppression for a Steerable Parametric Source Using the Sparse Random Array Technique. 3152-3161 - Yan Fang

, Wei Lu
, Xiaodong Liu
, Witold Pedrycz
, Qi Lang
, Jianhua Yang
:
CircularE: A Complex Space Circular Correlation Relational Model for Link Prediction in Knowledge Graph Embedding. 3162-3175 - Jie Zhang

, Rui Tao, Jun Du
, Li-Rong Dai
:
SDW-SWF: Speech Distortion Weighted Single-Channel Wiener Filter for Noise Reduction. 3176-3189 - Haozhou Li

, Qinke Peng
, Xu Mou
, Ying Wang
, Zeyuan Zeng
, Muhammad Fiaz Bashir
:
Abstractive Financial News Summarization via Transformer-BiLSTM Encoder and Graph Attention-Based Decoder. 3190-3205 - Weitao Yuan

, Shengbei Wang
, Jianming Wang
, Masashi Unoki
, Wenwu Wang
:
Unsupervised Deep Unfolded Representation Learning for Singing Voice Separation. 3206-3220 - Zhong-Qiu Wang

, Samuele Cornell
, Shukjae Choi, Younglo Lee, Byeong-Yeol Kim, Shinji Watanabe
:
TF-GridNet: Integrating Full- and Sub-Band Modeling for Speech Separation. 3221-3236 - Marvin Tammen

, Simon Doclo
:
Parameter Estimation Procedures for Deep Multi-Frame MVDR Filtering for Single-Microphone Speech Enhancement. 3237-3248 - Yi Lin

, Qingyang Wang
, Xincheng Yu
, Zichen Zhang
, Dongyue Guo
, Jizhe Zhou
:
Towards Recognition for Radio-Echo Speech in Air Traffic Control: Dataset and a Contrastive Learning Approach. 3249-3262 - Diego Caviedes-Nozal

, Efren Fernandez-Grande
:
Spatio-Temporal Bayesian Regression for Room Impulse Response Reconstruction With Spherical Waves. 3263-3277 - Xinyu Hu

, Xiaojun Wan
:
RST Discourse Parsing as Text-to-Text Generation. 3278-3289 - Shun Lei

, Yixuan Zhou
, Liyang Chen
, Zhiyong Wu
, Xixin Wu
, Shiyin Kang
, Helen Meng:
MSStyleTTS: Multi-Scale Style Modeling With Hierarchical Context Information for Expressive Speech Synthesis. 3290-3303 - Pedro Izquierdo Lehmann

, Rodrigo F. Cádiz
, Carlos A. Sing-Long
:
Towards Maximizing a Perceptual Sweet Spot for Spatial Sound With Loudspeakers. 3304-3319 - Han Zhu

, Dongji Gao
, Gaofeng Cheng
, Daniel Povey
, Pengyuan Zhang
, Yonghong Yan
:
Alternative Pseudo-Labeling for Semi-Supervised Automatic Speech Recognition. 3320-3330 - Junqing Zhang

, Liming Shi
, Mads Græsbøll Christensen
, Wen Zhang
, Lijun Zhang
, Jingdong Chen
:
CGMM-Based Sound Zone Generation Using Robust Pressure Matching With ATF Perturbation Constraints. 3331-3345 - Erfan Loweimi

, Andrea Carmantini
, Peter Bell
, Steve Renals
, Zoran Cvetkovic
:
Phonetic Error Analysis Beyond Phone Error Rate. 3346-3361 - Runxuan Yang

, Yuyang Peng
, Xiaolin Hu
:
A Fast High-Fidelity Source-Filter Vocoder With Lightweight Neural Modules. 3362-3373 - Yuxiang Zhang

, Zhuo Li, Jingze Lu, Hua Hua, Wenchao Wang
, Pengyuan Zhang
:
The Impact of Silence on Speech Anti-Spoofing. 3374-3389 - Philippe Gonzalez

, Tommy Sonne Alstrøm
, Tobias May
:
Assessing the Generalization Gap of Learning-Based Speech Enhancement Systems in Noisy and Reverberant Environments. 3390-3403 - Ziyi Xu

, Ziyue Zhao
, Tim Fingscheidt
:
Coded Speech Quality Measurement by a Non-Intrusive PESQ-DNN. 3404-3417 - Tao Li

, Chenxu Hu, Jian Cong, Xinfa Zhu, Jingbei Li
, Qiao Tian, Yuping Wang, Lei Xie
:
DiCLET-TTS: Diffusion Model Based Cross-Lingual Emotion Transfer for Text-to-Speech - A Study Between English and Mandarin. 3418-3430 - Xuexin Xu

, Liang Shi
, Xunquan Chen
, Pingyuan Lin
, Jie Lian
, Jinhui Chen
, Zhihong Zhang
, Edwin R. Hancock
:
Any-to-Any Voice Conversion With Multi-Layer Speaker Adaptation and Content Supervision. 3431-3445 - Chenpeng Du

, Yiwei Guo
, Xie Chen
, Kai Yu
:
Speaker Adaptive Text-to-Speech With Timbre-Normalized Vector-Quantized Feature. 3446-3456 - Yash Kumar Atri

, Vikram Goyal
, Tanmoy Chakraborty
:
Multi-Document Summarization Using Selective Attention Span and Reinforcement Learning. 3457-3467 - Maochun Huang

, Chunmei Qing
, Junpeng Tan
, Xiangmin Xu
:
Context-Based Adaptive Multimodal Fusion Network for Continuous Frame-Level Sentiment Prediction. 3468-3477 - Sebastian J. Schlecht

, Jon Fagerström
, Vesa Välimäki
:
Decorrelation in Feedback Delay Networks. 3478-3487 - Jinliang Lu

, Jiajun Zhang
:
Towards Unified Multi-Domain Machine Translation With Mixture of Domain Experts. 3488-3498 - Julien Hauret

, Thomas Joubaud
, Véronique Zimpfer
, Eric Bavu
:
Configurable EBEN: Extreme Bandwidth Extension Network to Enhance Body-Conducted Speech Capture. 3499-3512 - Wanli Peng

, Sheng Li
, Zhenxing Qian
, Xinpeng Zhang
:
Text Steganalysis Based on Hierarchical Supervised Learning and Dual Attention Mechanism. 3513-3526 - Lin Xu

, Qixian Zhou
, Jinlan Fu
, See-Kiong Ng
:
CET2: Modelling Topic Transitions for Coherent and Engaging Knowledge-Grounded Conversations. 3527-3536 - Vincent W. Neo

, Christine Evers
, Stephan Weiss
, Patrick A. Naylor
:
Signal Compaction Using Polynomial EVD for Spherical Array Processing With Applications. 3537-3549 - Gerald Enzner

, Svantje Voit:
Hybrid-Frequency-Resolution Adaptive Kalman Filter for Online Identification of Long Acoustic Responses With Low Input-Output Latency. 3550-3563 - Shang Gao

, Maoshen Jia
, Dingding Yao
, Jing Wang
:
Multi-Source Localization Using Optimized Time-Frequency Representation and Sparsity Component Analysis. 3564-3578 - He Qi

, Mingjie Gao
, Ka Fai Cedric Yiu
, Sven Nordholm
:
Distributed Microphone Array Localization Problem via SDP-SOCP Method. 3579-3588 - Hiroshi Sawada

, Rintaro Ikeshita
, Keisuke Kinoshita
, Tomohiro Nakatani
:
Multi-Frame Full-Rank Spatial Covariance Analysis for Underdetermined Blind Source Separation and Dereverberation. 3589-3602 - Hongyang Chang

, Hongfei Xu
, Josef van Genabith
, Deyi Xiong
, Hongying Zan
:
JoinER-BART: Joint Entity and Relation Extraction With Constrained Decoding, Representation Reuse and Fusion. 3603-3616 - Xinqi Huang

, Yingsong Li
, Yuriy V. Zakharov
, Yongchun Miao
, Zhixiang Huang
:
Squared Sine Adaptive Algorithm and Its Performance Analysis. 3617-3628 - Andong Li

, Guochen Yu
, Chengshi Zheng
, Wenzhe Liu
, Xiaodong Li
:
A General Unfolding Speech Enhancement Method Motivated by Taylor's Theorem. 3629-3646 - Bin Gu

, Jie Zhang
, Wu Guo
:
A Dynamic Convolution Framework for Session-Independent Speaker Embedding Learning. 3647-3658 - Daojian Zeng

, Chao Zhao
, Chao Jiang
, Jianling Zhu
, Jianhua Dai
:
Document-Level Relation Extraction With Context Guided Mention Integration and Inter-Pair Reasoning. 3659-3666 - Lu Li

, Maoshen Jia
, Jing Wang
, Ruiyuan Cao
:
Multiple-Speech-Source DOA Estimation Based on Single-Source Cluster Detection. 3667-3680 - Xiaoxiao Miao

, Xin Wang
, Erica Cooper
, Junichi Yamagishi
, Natalia A. Tomashenko
:
Speaker Anonymization Using Orthogonal Householder Neural Network. 3681-3695 - Zhengshan Xue

, Xiaolei Zhang, Tingxun Shi
, Deyi Xiong
:
DetTrans: A Lightweight Framework to Detect and Translate Noisy Inputs Simultaneously. 3696-3705 - Chang Liu

, Zhen-Hua Ling
, Ling-Hui Chen
:
Pronunciation Dictionary-Free Multilingual Speech Synthesis Using Learned Phonetic Representations. 3706-3716 - Reo Yoneyama

, Yi-Chiao Wu
, Tomoki Toda
:
High-Fidelity and Pitch-Controllable Neural Vocoder Based on Unified Source-Filter Networks. 3717-3729 - Stefan Thaleiser

, Gerald Enzner
:
Binaural-Projection Multichannel Wiener Filter for Cue-Preserving Binaural Speech Enhancement. 3730-3745 - Yixin Wang

, Wei Wei
, Xiangming Gu
, Xiaohong Guan
, Ye Wang
:
Disentangled Adversarial Domain Adaptation for Phonation Mode Detection in Singing and Speech. 3746-3759 - Yixuan Zhang

, Heming Wang
, DeLiang Wang
:
$F0$ Estimation and Voicing Detection With Cascade Architecture in Noisy Speech. 3760-3770 - Zhengdao Zhao

, Yuhua Wang
, Guang Shen
, Yuezhu Xu
, Jiayuan Zhang
:
TDFNet: Transformer-Based Deep-Scale Fusion Network for Multimodal Emotion Recognition. 3771-3782 - Johannes M. Arend

, Christoph Pörschmann
, Stefan Weinzierl
, Fabian Brinkmann
:
Magnitude-Corrected and Time-Aligned Interpolation of Head-Related Transfer Functions. 3783-3799 - Desh Raj

, Daniel Povey
, Sanjeev Khudanpur
:
SURT 2.0: Advances in Transducer-Based Multi-Talker Speech Recognition. 3800-3813 - Jiaming An

, Zixiang Ding
, Ke Li
, Rui Xia
:
Global-View and Speaker-Aware Emotion Cause Extraction in Conversations. 3814-3823 - Yuqin Lin

, Longbiao Wang
, Yanbing Yang
, Jianwu Dang
:
CFDRN: A Cognition-Inspired Feature Decomposition and Recombination Network for Dysarthric Speech Recognition. 3824-3836 - Rémi Blandin

, Simon Stone
, Angélique Remacle
, Vincent Didone, Peter Birkholz
:
A Comparative Study of 3D and 1D Acoustic Simulations of the Higher Frequencies of Speech. 3837-3847 - Qing Wang

, Jixun Yao
, Li Zhang
, Pengcheng Guo
, Lei Xie
:
Timbre-Reserved Adversarial Attack in Speaker Identification. 3848-3858 - Yachao Li

, Junhui Li
, Jing Jiang
, Shimin Tao, Hao Yang
, Min Zhang:
P-Transformer: Towards Better Document-to-Document Neural Machine Translation. 3859-3870 - Chao Xie

, Tomoki Toda
:
Noisy-to-Noisy Voice Conversion Under Variations of Noisy Condition. 3871-3882 - Zhichao Wang

, Xinsheng Wang
, Qicong Xie, Tao Li
, Lei Xie
, Qiao Tian, Yuping Wang:
MSM-VC: High-Fidelity Source Style Transfer for Non-Parallel Voice Conversion by Multi-Scale Style Modeling. 3883-3895 - Yilin Zhao

, Hai Zhao
, Sufeng Duan
:
Multi-Grained Evidence Inference for Multi-Choice Reading Comprehension. 3896-3907 - Ye-Qian Du

, Jie Zhang
, Xin Fang, Ming-Hui Wu, Zhouwang Yang
:
A Semi-Supervised Complementary Joint Training Approach for Low-Resource Speech Recognition. 3908-3921 - Changheng Li

, Richard C. Hendriks
:
Alternating Least-Squares-Based Microphone Array Parameter Estimation for a Single-Source Reverberant and Noisy Acoustic Scenario. 3922-3934 - Kun Zhou

, Yuanhang Zhou, Wayne Xin Zhao
, Ji-Rong Wen
:
Learning to Perturb for Contrastive Learning of Unsupervised Sentence Representations. 3935-3944 - Georg Götz

, Sebastian J. Schlecht
, Ville Pulkki
:
Common-Slope Modeling of Late Reverberation. 3945-3957 - Guanhua Chen

, Runzhe Zhan
, Derek F. Wong
, Lidia S. Chao
:
Multi-Level Curriculum Learning for Multi-Turn Dialogue Generation. 3958-3967 - Yun-Yen Chuang

, Hung-Min Hsu
, Kevin Lin
, Ray-I Chang
, Hung-Yi Lee
:
MetaEx-GAN: Meta Exploration to Improve Natural Language Generation via Generative Adversarial Networks. 3968-3980 - Chuxuan Tong

, Xi Zheng
, Jianhua Li
, Xingjun Ma
, Longxiang Gao
, Yong Xiang
:
Query-Efficient Black-Box Adversarial Attacks on Automatic Speech Recognition. 3981-3992 - Xixin Wu

, Hui Lu
, Kun Li
, Zhiyong Wu
, Xunying Liu
, Helen Meng
:
Hiformer: Sequence Modeling Networks With Hierarchical Attention Mechanisms. 3993-4003 - Ante Wang

, Linfeng Song
, Lifeng Jin
, Junfeng Yao
, Haitao Mi, Chen Lin
, Jinsong Su
, Dong Yu
:
D$^{2}$PSG: Multi-Party Dialogue Discourse Parsing as Sequence Generation. 4004-4013 - Nan Gao

, Yongjian Wang
, Peng Chen
, Jijun Tang
:
Boosting Short Text Classification by Solving the OOV Problem. 4014-4024

manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.


Google
Google Scholar
Semantic Scholar
Internet Archive Scholar
CiteSeerX
ORCID














