


Остановите войну!
for scientists:


default search action
IEEE/ACM Transactions on Audio, Speech and Language Processing, Volume 28
Volume 28, 2020
- Jamal Amini
, Richard Christian Hendriks
, Richard Heusdens
, Meng Guo
, Jesper Jensen
:
Rate-Constrained Noise Reduction in Wireless Acoustic Sensor Networks. 1-12 - Chitralekha Gupta
, Haizhou Li
, Ye Wang:
Automatic Leaderboard: Evaluation of Singing Quality Without a Standard Reference. 13-26 - Sefik Emre Eskimez
, Ross K. Maddox
, Chenliang Xu
, Zhiyao Duan
:
Noise-Resilient Training Method for Face Landmark Generation From Speech. 27-38 - Peidong Wang
, Ke Tan
, DeLiang Wang
:
Bridging the Gap Between Monaural Speech Enhancement and Recognition With Distortion-Independent Acoustic Modeling. 39-48 - Yuki Mitsufuji
, Stefan Uhlich
, Norihiro Takamune, Daichi Kitamura
, Shoichi Koyama
, Hiroshi Saruwatari
:
Multichannel Non-Negative Matrix Factorization Using Banded Spatial Covariance Matrices in Wavenumber Domain. 49-60 - Yaron Laufer, Sharon Gannot
:
Scoring-Based ML Estimation and CRBs for Reverberation, Speech, and Noise PSDs in a Spatially Homogeneous Noise Field. 61-76 - Naveen Kumar Desiraju
, Simon Doclo
, Markus Buck, Tobias Wolff:
Online Estimation of Reverberation Parameters For Late Residual Echo Suppression. 77-91 - Mehdi Zohourian
, Rainer Martin
:
Binaural Direct-to-Reverberant Energy Ratio and Speaker Distance Estimation. 92-104 - Youhyun Shin
, Sang-goo Lee:
Learning Context Using Segment-Level LSTM for Neural Sequence Labeling. 105-115 - Gongping Huang
, Jingdong Chen
, Jacob Benesty
:
Design of Planar Differential Microphone Arrays With Fractional Orders. 116-130 - Ming-Hsiang Su
, Chung-Hsien Wu
, Liang-Yu Chen:
Attention-Based Response Generation Using Parallel Double Q-Learning for Dialog Policy Decision in a Conversational System. 131-143 - Satoru Emura
:
Wave-Domain Residual Echo Reduction Using Subspace Tracking. 144-156 - Xin Wang
, Shinji Takaki, Junichi Yamagishi
, Simon King, Keiichi Tokuda:
A Vector Quantized Variational Autoencoder (VQ-VAE) Autoregressive Neural F0 Model for Statistical Parametric Speech Synthesis. 157-170 - Falk-Martin Hoffmann
, Philip Arthur Nelson, Filippo Maria Fazi:
DOA Estimation Performance With Circular Arrays in Sound Fields With Finite Rate of Innovation. 171-184 - Rongfeng Su
, Xunying Liu
, Lan Wang, Jingzhou Yang
:
Cross-Domain Deep Visual Feature Generation for Mandarin Audio-Visual Speech Recognition. 185-197 - Titouan Parcollet
, Mohamed Morchid
, Xavier Bost, Georges Linarès, Renato De Mori
:
Real to H-Space Autoencoders for Theme Identification in Telephone Conversations. 198-210 - Antonio Canclini
, Fabio Antonacci, Stefano Tubaro, Augusto Sarti
:
A Methodology for the Robust Estimation of the Radiation Pattern of Acoustic Sources. 211-224 - Yi Yu
, Hongsen He
, Badong Chen
, Jianghui Li
, Youwen Zhang
, Lu Lu
:
M-Estimate Based Normalized Subband Adaptive Filter Algorithm: Performance Analysis and Improvements. 225-239 - Haoxiang Wen
, Senquan Yang, Yuanquan Hong, Huan Luo:
A Partial Update Adaptive Algorithm for Sparse System Identification. 240-255 - Martin Bo Møller
, Jan Østergaard
:
A Moving Horizon Framework for Sound Zones. 256-265 - Stylianos Ioannis Mimilakis
, Konstantinos Drossos
, Estefanía Cano
, Gerald Schuller
:
Examining the Mapping Functions of Denoising Autoencoders in Singing Voice Separation. 266-278 - Lachlan Birnie
, Thushara D. Abhayapala
, Prasanga N. Samarasinghe
:
Reflection Assisted Sound Source Localization Through a Harmonic Domain MUSIC Framework. 279-293 - Wenhao Ding
, Liang He
:
Adaptive Multi-Scale Detection of Acoustic Events. 294-306 - Weijian Zhang, Peng Song
:
Transfer Sparse Discriminant Subspace Learning for Cross-Corpus Speech Emotion Recognition. 307-318 - Bidisha Sharma
, Ye Wang
:
Automatic Evaluation of Song Intelligibility Using Singing Adapted STOI and Vocal-Specific Features. 319-331 - Hai Morgenstern
, Boaz Rafaely
:
Perceptually-Transparent Online Estimation of Two-Channel Room Transfer Function for Sound Calibration. 332-342 - Shaojin Ding
, Guanlong Zhao
, Christopher Liberatore
, Ricardo Gutierrez-Osuna:
Learning Structured Sparse Representations for Voice Conversion. 343-354 - Mireia Díez
, Lukás Burget
, Federico Landini, Jan Cernocký
:
Analysis of Speaker Diarization Based on Bayesian HMM With Eigenvoice Priors. 355-368 - Jia-Chen Gu
, Zhen-Hua Ling
, Quan Liu:
Utterance-to-Utterance Interactive Matching Network for Multi-Turn Response Selection in Retrieval-Based Chatbots. 369-379 - Ke Tan
, DeLiang Wang
:
Learning Complex Spectral Mapping With Gated Convolutional Recurrent Networks for Monaural Speech Enhancement. 380-390 - Richeng Duan
, Tatsuya Kawahara
, Masatake Dantsuji, Hiroaki Nanjo:
Cross-Lingual Transfer Learning of Non-Native Acoustic Modeling for Pronunciation Error Detection and Diagnosis. 391-401 - Xin Wang
, Shinji Takaki, Junichi Yamagishi
:
Neural Source-Filter Waveform Models for Statistical Parametric Speech Synthesis. 402-415 - Sanjeel Parekh
, Slim Essid, Alexey Ozerov
, Ngoc Q. K. Duong
, Patrick Pérez, Gaël Richard
:
Weakly Supervised Representation Learning for Audio-Visual Scene Analysis. 416-428 - Jianfei Yu
, Jing Jiang, Rui Xia:
Entity-Sensitive Attention and Fusion Network for Entity-Level Multimodal Sentiment Classification. 429-439 - John G. Beerends
, Niels M. P. Neumann, Egon L. van den Broek
, Anna Llagostera Casanovas, Jovana Torres Menendez, Christian Schmidmer, Jens Berger:
Subjective and Objective Assessment of Full Bandwidth Speech Quality. 440-449 - Vikram C. Mathad
, S. R. Mahadeva Prasanna:
Vowel Onset Point Based Screening of Misarticulated Stops in Cleft Lip and Palate Speech. 450-460 - Minh Nguyen
, Gia H. Ngo
, Nancy F. Chen
:
Hierarchical Character Embeddings: Learning Phonological and Semantic Representations in Languages of Logographic Origin Using Recursive Neural Networks. 461-473 - Dani Cherkassky, Sharon Gannot
:
Successive Relative Transfer Function Identification Using Blind Oblique Projection. 474-486 - Ivo Trowitzsch
, Christopher Schymura
, Dorothea Kolossa
, Klaus Obermayer:
Joining Sound Event Detection and Localization Through Spatial Segregation. 487-502 - Shinichi Mogami
, Norihiro Takamune, Daichi Kitamura
, Hiroshi Saruwatari
, Yu Takahashi, Kazunobu Kondo, Nobutaka Ono
:
Independent Low-Rank Matrix Analysis Based on Time-Variant Sub-Gaussian Source Model for Determined Blind Source Separation. 503-518 - Hamzeh Ghasemzadeh
, Meisam Khalil Arjmandi
:
Toward Optimum Quantification of Pathology-Induced Noises: An Investigation of Information Missed by Human Auditory System. 519-528 - Fei Ma
, Wen Zhang
, Thushara Dheemantha Abhayapala
:
Active Control of Outgoing Broadband Noise Fields in Rooms. 529-539 - Jing-Xuan Zhang
, Zhen-Hua Ling
, Li-Rong Dai:
Non-Parallel Sequence-to-Sequence Voice Conversion With Disentangled Linguistic and Speaker Representations. 540-552 - Tao Dai
, Li Zhu
, Yaxiong Wang, Kathleen M. Carley
:
Attentive Stacked Denoising Autoencoder With Bi-LSTM for Personalized Context-Aware Citation Recommendation. 553-568 - Yuta Nishimura
, Katsuhito Sudoh
, Graham Neubig, Satoshi Nakamura
:
Multi-Source Neural Machine Translation With Missing Data. 569-580 - Jin Wang
, Liang-Chih Yu
, K. Robert Lai
, Xuejie Zhang:
Tree-Structured Regional CNN-LSTM Model for Dimensional Sentiment Analysis. 581-591 - Abul Azad
, Lamine Mili
:
Robust Speech Filter and Voice Encoder Parameter Estimation Using the Phase-Phase Correlator. 592-604 - Abdullah Fahim
, Prasanga N. Samarasinghe
, Thushara D. Abhayapala
:
Multi-Source DOA Estimation Through Pattern Recognition of the Modal Coherence of a Reverberant Soundfield. 605-618 - Yaron Laufer
, Bracha Laufer-Goldshtein
, Sharon Gannot
:
ML Estimation and CRBs for Reverberation, Speech, and Noise PSDs in Rank-Deficient Noise Field. 619-634 - Zhongqing Wang
, Qingying Sun, Shoushan Li, Qiaoming Zhu, Guodong Zhou
:
Neural Stance Detection With Hierarchical Linguistic Representations. 635-645 - Ruizhi Li
, Xiaofei Wang
, Sri Harish Mallidi, Shinji Watanabe
, Takaaki Hori
, Hynek Hermansky
:
Multi-Stream End-to-End Speech Recognition. 646-655 - Yu Maeno
, Yuki Mitsufuji
, Prasanga N. Samarasinghe
, Naoki Murata
, Thushara D. Abhayapala
:
Spherical-Harmonic-Domain Feedforward Active Noise Control Using Sparse Decomposition of Reference Signals from Distributed Sensor Arrays. 656-670 - Qingyu Zhou
, Nan Yang, Furu Wei, Shaohan Huang, Ming Zhou, Tiejun Zhao:
A Joint Sentence Scoring and Selection Framework for Neural Extractive Document Summarization. 671-681 - Ivan Kukanov
, Trung Ngo Trong, Ville Hautamäki
, Sabato Marco Siniscalchi
, Valerio Mario Salerno, Kong Aik Lee
:
Maximal Figure-of-Merit Framework to Detect Multi-Label Phonetic Features for Spoken Language Recognition. 682-695 - Shoichi Koyama
, Gilles Chardon
, Laurent Daudet
:
Optimizing Source and Sensor Placement for Sound Field Control: An Overview. 696-714 - Atsushi Ando
, Ryo Masumura, Hosana Kamiyama, Satoshi Kobashikawa, Yushi Aono, Tomoki Toda
:
Customer Satisfaction Estimation in Contact Center Calls Based on a Hierarchical Multi-Task Model. 715-728 - Thomas Dietzen
, Simon Doclo
, Marc Moonen
, Toon van Waterschoot:
Integrated Sidelobe Cancellation and Linear Prediction Kalman Filter for Joint Multi-Microphone Speech Dereverberation, Interfering Speech Cancellation, and Noise Reduction. 740-754 - Thomas Dietzen
, Simon Doclo
, Marc Moonen
, Toon van Waterschoot:
Square Root-Based Multi-Source Early PSD Estimation and Recursive RETF Update in Reverberant Environments by Means of the Orthogonal Procrustes Problem. 755-769 - Liwen Zhang
, Ziqiang Shi
, Jiqing Han
:
Pyramidal Temporal Pooling With Discriminative Mapping for Audio Classification. 770-784 - Mengfan Zhang
, Zhongshu Ge, Tiejun Liu, Xihong Wu, Tianshu Qu
:
Modeling of Individual HRTFs Based on Spatial Principal Component Analysis. 785-797 - Laureano Moro-Velázquez
, Estefanía Hernández-García, Jorge Andrés Gómez García
, Juan Ignacio Godino-Llorente
, Najim Dehak
:
Analysis of the Effects of Supraglottal Tract Surgical Procedures in Automatic Speaker Recognition Performance. 798-812 - Yijia Liu
, Wanxiang Che, Bing Qin, Ting Liu:
Exploring Segment Representations for Neural Semi-Markov Conditional Random Fields. 813-824 - Morten Kolbæk
, Zheng-Hua Tan
, Søren Holdt Jensen, Jesper Jensen:
On Loss Functions for Supervised Monaural Time-Domain Speech Enhancement. 825-838 - Yang Ai
, Zhen-Hua Ling
:
A Neural Vocoder With Hierarchical Generation of Amplitude and Phase Spectra for Statistical Parametric Speech Synthesis. 839-851 - Dongyan Yu, Huiping Duan
, Jun Fang, Bing Zeng
:
Predominant Instrument Recognition Based on Deep Neural Network With Auxiliary Classification. 852-861 - Ali Aroudi
, Simon Doclo
:
Cognitive-Driven Binaural Beamforming Using EEG-Based Auditory Attention Decoding. 862-875 - Christopher Gribben
, Hyunkook Lee
:
The Perception of Band-Limited Decorrelation Between Vertically Oriented Loudspeakers. 876-888 - Olivier Perrotin
, Ian Vince McLoughlin
:
Glottal Flow Synthesis for Whisper-to-Speech Conversion. 889-900 - Gongping Huang
, Jacob Benesty
, Israel Cohen
, Jingdong Chen
:
Differential Beamforming on Graphs. 901-913 - Bracha Laufer-Goldshtein
, Ronen Talmon
, Sharon Gannot
:
Global and Local Simplex Representations for Multichannel Source Separation. 914-928 - Henning F. Schepker
, Sven Nordholm
, Simon Doclo
:
Acoustic Feedback Suppression for Multi-Microphone Hearing Devices Using a Soft-Constrained Null-Steering Beamformer. 929-940 - Zhong-Qiu Wang
, DeLiang Wang
:
Deep Learning Based Target Cancellation for Speech Dereverberation. 941-950 - Yeongseok Kim
, Youngjin Park
:
Blockwise Weighted Least Square Active Noise Control for CPU-GPU Architecture. 951-963 - Odette Scharenborg
, Lucas Ondel, Shruti Palaskar, Philip Arthur, Francesco Ciannella, Mingxing Du, Elin Larsen, Danny Merkx
, Rachid Riad, Liming Wang
, Emmanuel Dupoux, Laurent Besacier, Alan W. Black
, Mark Hasegawa-Johnson
, Florian Metze
, Graham Neubig, Sebastian Stüker, Pierre Godard, Markus Müller:
Speech Technology for Unwritten Languages. 964-975 - Andros Tjandra
, Sakriani Sakti
, Satoshi Nakamura
:
Machine Speech Chain. 976-989 - M. Khadem-hosseini
, Shahrokh Ghaemmaghami
, Azra Abtahi
, Saeed Gazor
, Farrokh Marvasti
:
Error Correction in Pitch Detection Using a Deep Learning Based Classification. 990-999 - Enzo De Sena
, Zoran Cvetkovic
, Hüseyin Hacihabiboglu
, Marc Moonen
, Toon van Waterschoot
:
Localization Uncertainty in Time-Amplitude Stereophonic Reproduction. 1000-1015 - Vera Erbes
, Sascha Spors
:
Localisation Properties of Wave Field Synthesis in a Listening Room. 1016-1024 - Jia Pan
, Genshun Wan, Jun Du
, Zhongfu Ye
:
Online Speaker Adaptation Using Memory-Aware Networks for Speech Recognition. 1025-1037 - Weicheng Cai, Jinkun Chen
, Jun Zhang, Ming Li
:
On-the-Fly Data Loader and Utterance-Level Aggregation for Speaker and Language Recognition. 1038-1051 - George Sterpu
, Christian Saam
, Naomi Harte
:
How to Teach DNNs to Pay Attention to the Visual Modality in Speech Recognition. 1052-1064 - Christopher Schymura
, Dorothea Kolossa
:
Audiovisual Speaker Tracking Using Nonlinear Dynamical Systems With Dynamic Stream Weights. 1065-1078 - Gongping Huang
, Jacob Benesty
, Israel Cohen
, Jingdong Chen
:
A Simple Theory and New Method of Differential Beamforming With Uniform Linear Microphone Arrays. 1079-1093 - Chung-Ying Ho, Kuo-Kai Shyu, Cheng-Yuan Chang
, Sen M. Kuo:
Efficient Narrowband Noise Cancellation System Using Adaptive Line Enhancer. 1094-1103 - Aditya Arie Nugraha
, Kouhei Sekiguchi
, Kazuyoshi Yoshii
:
A Flow-Based Deep Latent Variable Model for Speech Spectrogram Modeling and Enhancement. 1104-1117 - Beat Gfeller
, Christian Havnø Frank
, Dominik Roblek
, Matthew Sharifi
, Marco Tagliasacchi
, Mihajlo Velimirovic
:
SPICE: Self-Supervised Pitch Estimation. 1118-1128 - Christoph Urbanietz
, Gerald Enzner
:
Direct Spatial-Fourier Regression of HRIRs from Multi-Elevation Continuous-Azimuth Recordings. 1129-1142 - Yaakov Buchris
, Israel Cohen
, Jacob Benesty
, Alon Amar
:
Joint Sparse Concentric Array Design for Frequency and Rotationally Invariant Beampattern. 1143-1158 - Tharindu Fernando
, Sridha Sridharan
, Mitchell McLaren, Darshana Priyasad
, Simon Denman
, Clinton Fookes
:
Temporarily-Aware Context Modeling Using Generative Adversarial Networks for Speech Activity Detection. 1159-1169 - Haipeng Sun
, Rui Wang
, Kehai Chen
, Masao Utiyama, Eiichiro Sumita, Tiejun Zhao:
Unsupervised Neural Machine Translation With Cross-Lingual Language Representation Agreement. 1170-1182 - Qiaoling Zhang
, WeiQiang Xu
, Weiwei Zhang
, Jie Feng, Zhiyong Chen
:
Multi-Hypothesis Square-Root Cubature Kalman Particle Filter for Speaker Tracking in Noisy and Reverberant Environments. 1183-1197 - Yinhe Zheng
, Guanyi Chen
, Minlie Huang
:
Out-of-Domain Detection for Natural Language Understanding in Dialog Systems. 1198-1209 - Ina Kodrasi
, Hervé Bourlard:
Spectro-Temporal Sparsity Characterization for Dysarthric Speech Detection. 1210-1222 - Bharat Padi, Anand Mohan
, Sriram Ganapathy
:
Towards Relevance and Sequence Modeling in Language Recognition. 1223-1232 - Iván López-Espejo
, Zheng-Hua Tan
, Jesper Jensen
:
Improved External Speaker-Robust Keyword Spotting for Hearing Assistive Devices. 1233-1247 - Vishnuvardhan Varanasi
, Harshit Gupta
, Rajesh M. Hegde
:
A Deep Learning Framework for Robust DOA Estimation Using Spherical Harmonic Decomposition. 1248-1259 - Sahar Hashemgeloogerdi
, Mark F. Bocko
:
Adaptive Feedback Cancellation in Hearing Aids Based on Orthonormal Basis Functions With Prediction-Error Method Based Prewhitening. 1260-1269 - Maximo Cobos
, Fabio Antonacci, Luca Comanducci
, Augusto Sarti
:
Frequency-Sliding Generalized Cross-Correlation: A Sub-Band Time Delay Estimation Approach. 1270-1281 - Yingying Zhu, Haiquan Zhao
, Xiangping Zeng, Badong Chen
:
Robust Generalized Maximum Correntropy Criterion Algorithms for Active Noise Control. 1282-1292 - Hassan Taherian
, Zhong-Qiu Wang
, Jorge Chang, DeLiang Wang
:
Robust Speaker Recognition Based on Single-Channel and Multi-Channel Speech Enhancement. 1293-1302 - Cunhang Fan
, Jianhua Tao
, Bin Liu
, Jiangyan Yi
, Zhengqi Wen, Xuefei Liu:
End-to-End Post-Filter for Speech Separation With Deep Attention Fusion Features. 1303-1314 - T. Lavanya
, T. Nagarajan, P. Vijayalakshmi
:
Multi-Level Single-Channel Speech Enhancement Using a Unified Framework for Estimating Magnitude and Phase Spectra. 1315-1327 - Adrien Ycart
, Emmanouil Benetos
:
Learning and Evaluation Methodologies for Polyphonic Music Sequence Prediction With LSTMs. 1328-1341 - Takatomo Kano
, Sakriani Sakti
, Satoshi Nakamura
:
End-to-End Speech Translation With Transcoding by Multi-Task Learning for Distant Language Pairs. 1342-1355 - Huanyu Zuo
, Prasanga N. Samarasinghe
, Thushara D. Abhayapala
:
Intensity Based Spatial Soundfield Reproduction Using an Irregular Loudspeaker Array. 1356-1369 - Chenglin Xu
, Wei Rao
, Eng Siong Chng
, Haizhou Li
:
SpEx: Multi-Scale Time Domain Speaker Extraction Network. 1370-1384 - Wangyou Zhang
, Xuankai Chang
, Yanmin Qian
, Shinji Watanabe
:
Improving End-to-End Single-Channel Multi-Talker Speech Recognition. 1385-1394 - Alakananda Vempala
, Eduardo Blanco
:
Extracting Biographical Spatial Timelines: Corpus and Experiments. 1395-1403 - Qiquan Zhang
, Aaron Nicolson
, Mingjiang Wang
, Kuldip K. Paliwal
, Chenxu Wang:
DeepMMSE: A Deep Learning Approach to MMSE-Based Noise Power Spectral Density Estimation. 1404-1415 - Dhananjay Ram
, Lesly Miculicich, Hervé Bourlard:
Neural Network Based End-to-End Query by Example Spoken Term Detection. 1416-1427 - Enea Ceolini
, Ilya Kiselev, Shih-Chii Liu
:
Evaluating Multi-Channel Multi-Device Speech Separation Algorithms in the Wild: A Hardware-Software Solution. 1428-1439 - Su Zhu
, Zijian Zhao, Rao Ma
, Kai Yu
:
Prior Knowledge Driven Label Embedding for Slot Filling in Natural Language Understanding. 1440-1451 - Haoran Miao
, Gaofeng Cheng
, Pengyuan Zhang
, Yonghong Yan
:
Online Hybrid CTC/Attention End-to-End Automatic Speech Recognition Architecture. 1452-1465