Takao Kobayashi, Keikichi Hirose, Satoshi Nakamura (Eds.):
INTERSPEECH 2010, 11th Annual Conference of the International Speech Communication Association, Makuhari, Chiba, Japan, September 26-30, 2010.
ISCA 2010
Keynotes
- Steve Young:
Still talking to machines (cognitively speaking).
1-10
- Tohru Ifukube:
Sound-based assistive technology supporting "seeing", "hearing" and "speaking" for the disabled and the elderly.
11-19
- Chiu-yu Tseng:
Beyond sentence prosody.
20-29
Special Session:
Models of Speech - In Search of Better Representations
- Hosung Nam, Vikramjit Mitra, Mark Tiede, Elliot Saltzman, Louis Goldstein, Carol Y. Espy-Wilson, Mark Hasegawa-Johnson:
A procedure for estimating gestural scores from natural speech.
30-33
- Yen-Liang Shue, Gang Chen, Abeer Alwan:
On the interdependencies between voice quality, glottal gaps, and voice-source related acoustic measures.
34-37
- Hideki Kawahara, Masanori Morise, Toru Takahashi, Hideki Banno, Ryuichi Nisimura, Toshio Irino:
Simplification and extension of non-periodic excitation source representations for high-quality speech manipulation systems.
38-41
- Sadao Hiroya, Takemi Mochida:
Phase equalization-based autoregressive model of speech signals.
42-45
- Yi Xu, Santitham Prom-on:
Articulatory-functional modeling of speech prosody: a review.
46-49
- Humberto M. Torres, Hansjörg Mixdorff, Jorge A. Gurlekian, Hartmut R. Pfitzinger:
Two new estimation methods for a superpositional intonation model.
50-53
ASR:
Acoustic Models I-III
- Simon Wiesler, Georg Heigold, Markus Nußbaum-Thom, Ralf Schlüter, Hermann Ney:
A discriminative splitting criterion for phonetic decision trees.
54-57
- Mark J. F. Gales, Kai Yu:
Canonical state models for automatic speech recognition.
58-61
- Pierre L. Dognin, John R. Hershey, Vaibhava Goel, Peder A. Olsen:
Restructuring exponential family mixture models.
62-65
- Françoise Beaufays, Vincent Vanhoucke, Brian Strope:
Unsupervised discovery and training of maximally dissimilar cluster models.
66-69
- Khe Chai Sim:
Probabilistic state clustering using conditional random field for context-dependent acoustic modelling.
70-73
- Xie Sun, Yunxin Zhao:
Integrate template matching and statistical modeling for speech recognition.
74-77
- George Saon, Hagen Soltau:
Boosting systems for LVCSR.
1341-1344
- Vaibhava Goel, Tara N. Sainath, Bhuvana Ramabhadran, Peder A. Olsen, David Nahamoo, Dimitri Kanevsky:
Incorporating sparse representation phone identification features in automatic speech recognition using exponential families.
1345-1348
- Xin Chen, Yunxin Zhao:
Integrating MLP features and discriminative training in data sampling based ensemble acoustic modeling.
1349-1352
- Jui-Ting Huang, Mark Hasegawa-Johnson:
Semi-supervised training of Gaussian mixture models by conditional entropy minimization.
1353-1356
- Guangchuan Shi, Yu Shi, Qiang Huo:
A study of irrelevant variability normalization based training and unsupervised online adaptation for LVCSR.
1357-1360
- Roger Hsiao, Florian Metze, Tanja Schultz:
Improvements to generalized discriminative feature transformation for speech recognition.
1361-1364
- Karel Veselý, Lukas Burget, Frantisek Grézl:
Parallel training of neural networks for speech recognition.
2934-2937
- Rita Singh, Benjamin Lambert, Bhiksha Raj:
The use of sense in unsupervised training of acoustic models for ASR systems.
2938-2941
- Jun Du, Yu Hu, Hui Jiang:
Boosted mixture learning of Gaussian mixture HMMs for speech recognition.
2942-2945
- Volker Leutnant, Reinhold Haeb-Umbach:
On the exploitation of hidden Markov models and linear dynamic models in a hybrid decoder architecture for continuous speech recognition.
2946-2949
- Alberto Abad, Thomas Pellegrini, Isabel Trancoso, João Paulo Neto:
Context dependent modelling approaches for hybrid speech recognizers.
2950-2953
- Yotaro Kubo, Shinji Watanabe, Atsushi Nakamura, Tetsunori Kobayashi:
A regularized discriminative training method of acoustic models derived by minimum relative entropy discrimination.
2954-2957
- Hank Liao, Christopher Alberti, Michiel Bacchiani, Olivier Siohan:
Decision tree state clustering with word and syllable features.
2958-2961
- Hiroshi Fujimura, Takashi Masuko, Mitsuyoshi Tachimori:
A duration modeling technique with incremental speech rate normalization.
2962-2965
- Martin Wöllmer, Yang Sun, Florian Eyben, Björn Schuller:
Long short-term memory networks for noise robust speech recognition.
2966-2969
- Tsuneo Nitta, Takayuki Onoda, Masashi Kimura, Yurie Iribe, Kouichi Katsurada:
One-model speech recognition and synthesis based on articulatory movement HMMs.
2970-2973
- Xiaodong Cui, Jian Xue, Pierre L. Dognin, Upendra V. Chaudhari, Bowen Zhou:
Acoustic modeling with bootstrap and restructuring for low-resourced languages.
2974-2977
- Tetsuo Kosaka, Keisuke Goto, Takashi Ito, Masaharu Katoh:
Lecture speech recognition by combining word graphs of various acoustic models.
2978-2981
- Khe Chai Sim, Shilin Liu:
Semi-parametric trajectory modelling using temporally varying feature mapping for speech recognition.
2982-2985
- Dong Yu, Li Deng:
Deep-structured hidden conditional random fields for phonetic recognition.
2986-2989
- Jonathan Malkin, Jeff A. Bilmes:
Semi-supervised learning for improved expression of uncertainty in discriminative classifiers.
2990-2993
- Peder A. Olsen, Vaibhava Goel, Charles A. Micchelli, John R. Hershey:
Modeling posterior probabilities using the linear exponential family.
2994-2997
Spoken Dialogue Systems I,
II
- Fabrice Lefèvre, François Mairesse, Steve Young:
Cross-lingual spoken language understanding from unaligned data using discriminative classification models and machine translation.
78-81
- Rajesh Balchandran, Leonid Rachevsky, Bhuvana Ramabhadran, Miroslav Novak:
Techniques for topic detection based processing in spoken dialog systems.
82-85
- Senthilkumar Chandramohan, Matthieu Geist, Olivier Pietquin:
Optimizing spoken dialogue management with fitted value iteration.
86-89
- Filip Jurcícek, Blaise Thomson, Simon Keizer, François Mairesse, Milica Gasic, Kai Yu, Steve Young:
Natural belief-critic: a reinforcement algorithm for parameter estimation in statistical spoken dialogue systems.
90-93
- Alexander Schmitt, Michael Scholz, Wolfgang Minker, Jackson Liscombe, David Suendermann:
Is it possible to predict task completion in automated troubleshooters?.
94-97
- David Suendermann, Jackson Liscombe, Roberto Pieraccini:
Minimally invasive surgery for spoken dialog systems.
98-101
Spoken Dialogue Systems II
- Ramón López-Cózar, David Griol:
New technique to enhance the performance of spoken dialogue systems based on dialogue states-dependent language models and grammatical rules.
2998-3001
- Lluís F. Hurtado, Joaquin Planells, Encarna Segarra, Emilio Sanchis, David Griol:
A stochastic finite-state transducer approach to spoken dialog management.
3002-3005
- Romain Laroche, Philippe Bretier, Ghislain Putois:
Enhanced monitoring tools and online dialogue optimisation merged into a new spoken dialogue system design experience.
3006-3009
- Romain Laroche, Ghislain Putois, Philippe Bretier:
Optimising a handcrafted dialogue system design.
3010-3013
- Felix Putze, Tanja Schultz:
Utterance selection for speech acts in a cognitive tourguide scenario.
3014-3017
- Gabriel Parent, Maxine Eskenazi:
Lexical entrainment of real users in the let's go spoken dialog system.
3018-3021
- Silvia Quarteroni, Meritxell González, Giuseppe Riccardi, Sebastian Varges:
Combining user intention and error modeling for statistical dialog simulators.
3022-3025
- Jaakko Hakulinen, Markku Turunen, Raul Santos de la Camara, Nigel Crook:
Parallel processing of interruptions and feedback in companions affective dialogue system.
3026-3029
- Antoine Raux, Neville Mehta, Deepak Ramachandran, Rakesh Gupta:
Dynamic language modeling using Bayesian networks for spoken dialog systems.
3030-3033
- Sunao Hara, Norihide Kitaoka, Kazuya Takeda:
Automatic detection of task-incompleted dialog for spoken dialog system based on dialog act n-gram.
3034-3037
- Wei-Bin Liang, Chung-Hsien Wu, Yu-Cheng Hsiao:
Dialogue act detection in error-prone spoken dialogue systems using partial sentence tree and latent dialogue act matrix.
3038-3041
- Tatsuya Kawahara, Kouhei Sumi, Zhi-Qiang Chang, Katsuya Takanashi:
Detection of hot spots in poster conversations based on reactive tokens of audience.
3042-3045
- Yoichi Matsuyama, Shinya Fujie, Hikaru Taniyama, Tetsunori Kobayashi:
Psychological evaluation of a group communication activation robot in a party game.
3046-3049
- Kyoko Matsuyama, Kazunori Komatani, Ryu Takeda, Toru Takahashi, Tetsuya Ogata, Hiroshi G. Okuno:
Analyzing user utterances in barge-in-able spoken dialogue system for improving identification accuracy.
3050-3053
- Mattias Heldner, Jens Edlund, Julia Hirschberg:
Pitch similarity in the vicinity of backchannels.
3054-3057
- Khiet P. Truong, Ronald Poppe, Dirk Heylen:
A rule-based backchannel prediction model using pitch and pause information.
3058-3061
Speech Perception:
Factors Influencing Perception
Prosody:
Models
- Tomás Dubeda, Katalin Mády:
Nucleus position within the intonation phrase: a typological study of English, Czech and Hungarian.
126-129
- Yong-cheol Lee, Satoshi Nambu:
Focus-sensitive operator or focus inducer: always and only.
130-133
- Jiahong Yuan, Mark Liberman:
F0 declination in English and Mandarin broadcast news speech.
134-137
- Katrin Schweitzer, Michael Walsh, Bernd Möbius, Hinrich Schütze:
Frequency of occurrence effects on pitch accent realisation.
138-141
- César González Ferreras, Carlos Vivaracho-Pascual, David Escudero Mancebo, Valentín Cardeñoso-Payo:
On the automatic toBI accent type identification from data.
142-145
- Andrew Rosenberg:
AutoBI - a tool for automatic toBI annotation.
146-149
Speech Synthesis:
Unit Selection and Others
- Volker Strom, Simon King:
A classifier-based target cost for unit selection speech synthesis trained on perceptual data.
150-153
- Wei Zhang, Xiaodong Cui:
Applying scalable phonetic context similarity in unit selection of concatenative text-to-speech.
154-157
- Mitsuaki Isogai, Hideyuki Mizuno:
Speech database reduction method for corpus-based TTS system.
158-161
- Heng Lu, Zhen-Hua Ling, Si Wei, Li-Rong Dai, Ren-Hua Wang:
Automatic error detection for unit selection speech synthesis using log likelihood ratio based SVM classifier.
162-165
- Hanna Silén, Elina Helander, Jani Nurminen, Konsta Koppinen, Moncef Gabbouj:
Using robust viterbi algorithm and HMM-modeling in unit selection TTS to replace units of poor quality.
166-169
- Yeon-Jun Kim, Marc C. Beutnagel:
Automatic detection of abnormal stress patterns in unit selection synthesis.
170-173
- Daniel Tihelka, Jirí Kala, Jindrich Matousek:
Enhancements of viterbi search for fast unit selection synthesis.
174-177
- Thomas Ewender, Beat Pfister:
Accurate pitch marking for prosodic modification of speech segments.
178-181
- Shifeng Pan, Meng Zhang, Jianhua Tao:
A novel hybrid approach for Mandarin speech synthesis.
182-185
- Josafá de Jesus Aguiar Pontes, Sadaoki Furui:
Modeling liaison in French by using decision trees.
186-189
- Jian Luan, Jian Li:
Improvement on plural unit selection and fusion.
190-193
- Alok Parlikar, Alan W. Black, Stephan Vogel:
Improving speech synthesis of machine translation output.
194-197
- Ghislain Putois, Jonathan Chevelu, Cédric Boidin:
Paraphrase generation to improve text-to-speech synthesis.
198-201
ASR:
Search,
Decoding and Confidence Measures I,
II
- Chang Woo Han, Shin Jae Kang, Chul Min Lee, Nam Soo Kim:
Phone mismatch penalty matrices for two-stage keyword spotting via multi-pass phone recognizer.
202-205
- Petr Motlícek, Fabio Valente, Philip N. Garner:
English spoken term detection in multilingual recordings.
206-209
- Icksang Han, Chiyoun Park, Jeongmi Cho, Jeongsu Kim:
A hybrid approach to robust word lattice generation via acoustic-based word detection.
210-213
- Volker Steinbiss, Martin Sundermeyer, Hermann Ney:
Direct observation of pruning errors (DOPE): a search analysis tool.
214-217
- David Rybach, Michael Riley:
Direct construction of compact context-dependency transducers from data.
218-221
- Miroslav Novak:
Incremental composition of static decoding graphs with label pushing.
222-225
- Zhanlei Yang, Wenju Liu:
A novel path extension framework using steady segment detection for Mandarin speech recognition.
226-229
- Ralf Schlüter, Markus Nußbaum-Thom, Hermann Ney:
On the relation of Bayes risk, word error, and word posteriors in ASR.
230-233
- David Nolden, Hermann Ney, Ralf Schlüter:
Time conditioned search in automatic speech recognition reconsidered.
234-237
- Satoshi Kobashikawa, Taichi Asami, Yoshikazu Yamaguchi, Hirokazu Masataki, Satoshi Takahashi:
Efficient data selection for speech recognition based on prior confidence estimation using speech and context independent models.
238-241
- Atsunori Ogawa, Atsushi Nakamura:
A novel confidence measure based on marginalization of jointly estimated error cause probabilities.
242-245
- Julien Fayolle, Fabienne Moreau, Christian Raymond, Guillaume Gravier, Patrick Gros:
CRF-based combination of contextual features to improve a posteriori word-level confidence measures.
1942-1945
- Martin Wöllmer, Florian Eyben, Björn Schuller, Gerhard Rigoll:
Recognition of spontaneous conversational speech using long short-term memory phoneme predictions.
1946-1949
- Thomas Pellegrini, Isabel Trancoso:
Improving ASR error detection with non-decoder based features.
1950-1953
- Ladan Golipour, Douglas D. O'Shaughnessy:
Phoneme classification and lattice rescoring based on a k-NN approach.
1954-1957
- Jeff Bilmes, Hui Lin:
Online adaptive learning for speech recognition decoding.
1958-1961
- Takaaki Hori, Shinji Watanabe, Atsushi Nakamura:
Improvements of search error risk minimization in viterbi beam search for speech recognition.
1962-1965
Special-Purpose Speech Applications
- Robin Hofe, Stephen R. Ell, Michael J. Fagan, James M. Gilbert, Phil D. Green, Roger K. Moore, Sergey I. Rybchenko:
Evaluation of a silent speech interface based on magnetic sensing.
246-249
- Rubén San Segundo, Verénica López, Raquel Martín, Syaheerah L. Lutfi, Javier Ferreiros, Ricardo de Córdoba, José Manuel Pardo:
Advanced speech communication system for deaf people.
250-253
- Sethserey Sam, Eric Castelli, Laurent Besacier:
Unsupervised acoustic model adaptation for multi-origin non native ASR.
254-257
- Dilek Hakkani-Tür, Dimitra Vergyri, Gökhan Tür:
Speech-based automated cognitive status assessment.
258-261
- Toru Imai, Shinichi Homma, Akio Kobayashi, Takahiro Oku, Shoei Sato:
Speech recognition with a seamlessly updated language model for real-time closed-captioning.
262-265
- Takuya Nishimoto, Takayuki Watanabe:
The comparison between the deletion-based methods and the mixing-based methods for audio CAPTCHA systems.
266-269
- Martine Adda-Decker, Lori Lamel, Natalie D. Snoeren:
Comparing mono- & multilingual acoustic seed models for a low e-resourced language: a case-study of luxembourgish.
270-273
- R. J. J. H. van Son, Irene Jacobi, Frans Hilgers:
Manipulating treacheoesophageal speech.
274-277
- David Imseng, Hervé Bourlard, Mathew Magimai-Doss:
Towards mixed language speech recognition systems.
278-281
- Etienne Barnard, Johan Schalkwyk, Charl Johannes van Heerden, Pedro J. Moreno:
Voice search for development.
282-285
- Gina-Anne Levow, Susan Duncan, Edward T. King:
Cross-cultural investigation of prosody in verbal feedback in interactional rapport.
286-289
- Mary Tai Knox, Gerald Friedland:
Multimodal speaker diarization using oriented optical flow histograms.
290-293
- Catherine Middag, Yvan Saeys, Jean-Pierre Martens:
Towards an ASR-free objective analysis of pathological speech.
294-297
Speech Analysis
- Keith W. Godin, John H. L. Hansen:
Session variability contrasts in the MARP corpus.
298-301
- Kazuhiro Kondo, Yusuke Takano:
Estimation of two-to-one forced selection intelligibility scores by speech recognizers using noise-adapted models.
302-305
- Thomas Schaaf, Florian Metze:
Analysis of gender normalization using MLP and VTLN features.
306-309
- Guillaume Aimetti, Roger K. Moore, Louis ten Bosch:
Discovering an optimal set of minimally contrasting acoustic speech units: a point of focus for whole-word pattern matching.
310-313
- Themos Stafylakis, Xavier Anguera:
Improvements to the equal-parameter BIC for speaker diarization.
314-317
- Nima Mesgarani, Samuel Thomas, Hynek Hermansky:
A multistream multiresolution framework for phoneme recognition.
318-321
- Giampiero Salvi, Fabio Tesser, Enrico Zovato, Piero Cosi:
Cluster analysis of differential spectral envelopes on emotional speech.
322-325
- Sam Bowman, Karen Livescu:
Modeling pronunciation variation with context-dependent articulatory feature decision trees.
326-329
- Bhiksha Raj, Kevin W. Wilson, Alexander Krueger, Reinhold Haeb-Umbach:
Ungrounded independent non-negative factor analysis.
330-333
- John R. Hershey, Peder A. Olsen, Steven J. Rennie:
Signal interaction and the devil function.
334-337
Systems for LVCSR
- Yuya Akita, Masato Mimura, Graham Neubig, Tatsuya Kawahara:
Semi-automated update of automatic transcription system for the Japanese national congress.
338-341
- Xunying Liu, Mark J. F. Gales, Philip C. Woodland:
Language model cross adaptation for LVCSR system combination.
342-345
- Shinji Watanabe, Takaaki Hori, Atsushi Nakamura:
Large vocabulary continuous speech recognition using WFST-based linear classifier for structured data.
346-349
- Pavel Kveton, Miroslav Novak:
Accelerating hierarchical acoustic likelihood computation on graphics processors.
350-353
- Jiulong Shan, Genqing Wu, Zhihong Hu, Xiliu Tang, Martin Jansche, Pedro J. Moreno:
Search by voice in Mandarin Chinese.
354-357
- Thomas Hain, Lukas Burget, John Dines, Philip N. Garner, Asmaa El Hannani, Marijn Huijbregts, Martin Karafiát, Mike Lincoln, Vincent Wan:
The AMIDA 2009 meeting transcription system.
358-361
Speaker Characterization and Recognition I-IV
- William M. Campbell, Zahi N. Karam:
Simple and efficient speaker comparison using approximate KL divergence.
362-365
- Hanwu Sun, Bin Ma, Chien-Lin Huang, Trung Hieu Nguyen, Haizhou Li:
The IIR NIST SRE 2008 and 2010 summed channel speaker recognition systems.
366-369
- Chien-Lin Huang, Hanwu Sun, Bin Ma, Haizhou Li:
Speaker characterization using long-term and temporal information.
370-373
- Sergio Perez-Gomez, Daniel Ramos, Javier Gonzalez-Dominguez, Joaquin Gonzalez-Rodriguez:
Score-level compensation of extreme speech duration variability in speaker verification.
374-377
- Alberto Abad, Isabel Trancoso:
Speaker recognition experiments using connectionist transformation network features.
378-381
- Yun Lei, John H. L. Hansen:
Speaker recognition using supervised probabilistic principal component analysis.
382-385
- Benjamin Bigot, Julien Pinquier, Isabelle Ferrane, Régine André-Obrecht:
Looking for relevant features for speaker role recognition.
1057-1060
- Marcel Kockmann, Lukas Burget, Ondrej Glembek, Luciana Ferrer, Jan Cernocký:
Prosodic speaker verification using subspace multinomial models with intersession compensation.
1061-1064
- Eryu Wang, Kong-Aik Lee, Bin Ma, Haizhou Li, Wu Guo, Li-Rong Dai:
The estimation and kernel metric of spectral correlation for text-independent speaker verification.
1065-1068
- Rahim Saeidi, Pejman Mowlaee, Tomi Kinnunen, Zheng-Hua Tan, Mads Græsbøll Christensen, Søren Holdt Jensen, Pasi Fränti:
Improving monaural speaker identification by double-talk detection.
1069-1072
- B. Avinash, S. Guruprasad, B. Yegnanarayana:
Exploring subsegmental and suprasegmental features for a text-dependent speaker verification in distant speech signals.
1073-1076
- Qingsong Liu, Wei Huang, Dongxing Xu, Hongbin Cai, Beiqian Dai:
A fast implementation of factor analysis for speaker verification.
1077-1080
- Ce Zhang, Rong Zheng, Bo Xu:
An investigation into direct scoring methods without SVM training in speaker verification.
1437-1440
- Reda Jourani, Khalid Daoudi, Régine André-Obrecht, Driss Aboutajdine:
Large margin Gaussian mixture models for speaker identification.
1441-1444
- Rong Zheng, Bo Xu:
On the use of Gaussian component information in the generative likelihood ratio estimation for speaker verification.
1445-1448
- Man-Wai Mak, Wei Rao:
Acoustic vector resampling for GMMSVM-based speaker verification.
1449-1452
- Konstantin Biatov:
A fast speaker indexing using vector quantization and second order statistics with adaptive threshold computation.
1453-1456
- Gang Wang, Xiaojun Wu, Thomas Fang Zheng:
Using phoneme recognition and text-dependent speaker verification to improve speaker segmentation for Chinese speech.
1457-1460
- Claudio Garretón, Néstor Becerra Yoma:
On enhancing feature sequence filtering with filter-bank energy transformation in speaker verification with telephone speech.
1461-1464
- Donglai Zhu, Bin Ma, Kong-Aik Lee, Cheung-Chi Leung, Haizhou Li:
MAP estimation of subspace transform for speaker recognition.
1465-1468
- Ayeh Jafari, Ramji Srinivasan, Danny Crookes, Ji Ming:
A longest matching segment approach for text-independent speaker recognition.
1469-1472
- Ville Hautamäki, Tomi Kinnunen, Mohaddeseh Nosratighods, Kong-Aik Lee, Bin Ma, Haizhou Li:
Approaching human listener accuracy with modern speaker verification.
1473-1476
- Jouni Pohjalainen, Rahim Saeidi, Tomi Kinnunen, Paavo Alku:
Extended weighted linear prediction (XLP) analysis of speech and its application to speaker verification in adverse conditions.
1477-1480
- Guoli Ye, Brian Mak:
The use of subvector quantization and discrete densities for fast GMM computation for speaker verification.
1481-1484
- Fred S. Richardson, Joseph P. Campbell:
Transcript-dependent speaker recognition using mixer 1 and 2.
2102-2105
- Thomas Drugman, Thierry Dutoit:
On the potential of glottal signatures for speaker recognition.
2106-2109
- R. Padmanabhan, Hema A. Murthy:
Acoustic feature diversity and speaker verification.
2110-2113
- Omid Dehzangi, Bin Ma, Engsiong Chng, Haizhou Li:
A discriminative performance metric for GMM-UBM speaker identification.
2114-2117
- Xavier Anguera, Jean-François Bonastre:
A novel speaker binary key derived from anchor models.
2118-2121
- Weiqiang Zhang, Yan Deng, Liang He, Jia Liu:
Variant time-frequency cepstral features for speaker recognition.
2122-2125
- Ning Wang, P. C. Ching, Tan Lee:
Exploitation of phase information for speaker recognition.
2126-2129
- Yanhua Long, Li-Rong Dai, Bin Ma, Wu Guo:
Effects of the phonological relevance in speaker verification.
2130-2133
- Gabriel H. Sierra, Jean-François Bonastre, Driss Matrouf, José R. Calvo:
Topological representation of speech for speaker recognition.
2134-2137
- Seyed Omid Sadjadi, John H. L. Hansen:
Assessment of single-channel speech enhancement techniques for speaker identification under mismatched conditions.
2138-2141
- Xiang Zhang, Chuan Cao, Lin Yang, Hongbin Suo, Jianping Zhang, Yonghong Yan:
Speaker recognition using the resynthesized speech via spectrum modeling.
2142-2145
Source Separation
- Robert Peharz, Michael Stark, Franz Pernkopf, Yannis Stylianou:
A factorial sparse coder model for single channel source separation.
386-389
- Yasmina Benabderrahmane, Sid-Ahmed Selouani, Douglas D. O'Shaughnessy:
Oriented PCA method for blind speech separation of convolutive mixtures.
390-393
- Hsin-Lung Hsieh, Jen-Tzung Chien:
Online Gaussian process for nonstationary speech separation.
394-397
- Meng Yu, Wenye Ma, Jack Xin, Stanley Osher:
Convexity and fast speech extraction by split bregman method.
398-401
- Wenye Ma, Meng Yu, Jack Xin, Stanley Osher:
Reducing musical noise in blind source separation by time-domain sparse filters and split bregman method.
402-405
- John Woodruff, Rohit Prabhavalkar, Eric Fosler-Lussier, DeLiang Wang:
Combining monaural and binaural evidence for reverberant speech segregation.
406-409
Speech Synthesis:
HMM-Based Speech Synthesis I,
II
- Heiga Zen:
Speaker and language adaptive training for HMM-based polyglot speech synthesis.
410-413
- Kai Yu, Heiga Zen, François Mairesse, Steve Young:
Context adaptive training with factorized decision trees for HMM-based speech synthesis.
414-417
- Junichi Yamagishi, Oliver Watts, Simon King, Bela Usabaev:
Roles of the average voice in speaker-adaptive HMM-based speech synthesis.
418-421
- Yao Qian, Zhi-Jie Yan, Yi-Jian Wu, Frank K. Soong, Xin Zhuang, Shengyi Kong:
An HMM trajectory tiling (HTT) approach to high quality TTS.
422-425
- Yining Chen, Zhi-Jie Yan, Frank K. Soong:
A perceptual study of acceleration parameters in HMM-based TTS.
426-429
- Shuji Yokomizo, Takashi Nose, Takao Kobayashi:
Evaluation of prosodic contextual factors for HMM-based speech synthesis.
430-433
- Slava Shechtman, Alexander Sorin:
Sinusoidal model parameterization for HMM-based TTS system.
805-808
- Yoshinori Shiga, Tomoki Toda, Shinsuke Sakai, Hisashi Kawai:
Improved training of excitation for HMM-based parametric speech synthesis.
809-812
- June Sig Sung, Doo Hwa Hong, Kyung Hwan Oh, Nam Soo Kim:
Excitation modeling based on waveform interpolation for HMM-based speech synthesis.
813-816
- Xin Zhuang, Yao Qian, Frank K. Soong, Yi-Jian Wu, Bo Zhang:
Formant-based frequency warping for improving speaker adaptation in HMM TTS.
817-820
- Hongwei Hu, Martin J. Russell:
Improved modelling of speech dynamics using non-linear formant trajectories for HMM-based speech synthesis.
821-824
- Zhen-Hua Ling, Yu Hu, Li-Rong Dai:
Global variance modeling on the log power spectrum of LSPs for HMM-based speech synthesis.
825-828
- Matt Shannon, William Byrne:
Autoregressive clustering for HMM speech synthesis.
829-832
- Nicholas Pilkington, Heiga Zen:
An implementation of decision tree-based context clustering on graphics processing units.
833-836
- Alexander Gutkin, Xavi Gonzalvo, Stefan Breuer, Paul Taylor:
Quantized HMMs for low footprint text-to-speech synthesis.
837-840
- Oliver Watts, Junichi Yamagishi, Simon King:
The role of higher-level linguistic features in HMM-based speech synthesis.
841-844
- Ayami Mase, Keiichiro Oura, Yoshihiko Nankaku, Keiichi Tokuda:
HMM-based singing voice synthesis system using pitch-shifted pseudo training data.
845-848
- Jinfu Ni, Hisashi Kawai:
An unsupervised approach to creating web audio contents-based HMM voices.
849-852
- Tomoki Koriyama, Takashi Nose, Takao Kobayashi:
Conversational spontaneous speech synthesis using average voice model.
853-856
Multi-Modal Signal Processing
- Jonas Hörnstein, José Santos-Victor:
Learning words and speech units through natural interactions.
434-437
- Qingju Liu, Wenwu Wang, Philip J. B. Jackson:
Bimodal coherence based scale ambiguity cancellation for target speech extraction and enhancement.
438-441
- Hiroaki Kawashima, Yu Horii, Takashi Matsuyama:
Speech estimation in non-stationary noise environments using timing structures between mouth movements and sound signals.
442-445
- Lijuan Wang, Xiaojun Qian, Wei Han, Frank K. Soong:
Synthesizing photo-real talking head via trajectory-guided sample selection.
446-449
- Victoria M. Florescu, Lise Crevier-Buchman, Bruce Denby, Thomas Hueber, Antonia Colazo-Simon, Claire Pillot-Loiseau, Pierre Roussel-Ragot, Cédric Gendrot, Sophie Quattrocchi:
Silent vs vocalized articulation for a portable ultrasound-based silent speech interface.
450-453
- Gregor Hofer, Korin Richmond:
Comparison of HMM and TMDN methods for lip synchronisation.
454-457
Paralanguage
- Florian Schiel, Christian Heinrich, Veronika Neumeyer:
Rhythm and formant features for automatic alcohol detection.
458-461
- Irena Yanushevskaya, Christer Gobl, John Kane, Ailbhe Ní Chasaide:
An exploration of voice source correlates of focus.
462-465
- James D. Harnsberger, Rahul Shrivastav, W. S. Brown Jr.:
Modeling perceived vocal age in american English.
466-469
- Marie-José Caraty, Claude Montacié:
Multivariate analysis of vocal fatigue in continuous reading.
470-473
- Alexander Kain, Jan P. H. van Santen:
Frequency-domain delexicalization using surrogate vowels.
474-477
- Florian Metze, Anton Batliner, Florian Eyben, Tim Polzehl, Björn Schuller, Stefan Steidl:
Emotion recognition using imperfect speech recognition.
478-481
- Gang Liu, Yun Lei, John H. L. Hansen:
A novel feature extraction strategy for multi-stream robust emotion identification.
482-485
- Asterios Toutios, Utpala Musti, Slim Ouni, Vincent Colotte, Brigitte Wrobel-Dautcourt, Marie-Odile Berger:
Setup for acoustic-visual speech synthesis by concatenating bimodal units.
486-489
- Bart Jochems, Martha Larson, Roeland Ordelman, Ronald Poppe, Khiet P. Truong:
Towards affective state modeling in narrative and conversational settings.
490-493
- Narichika Nomoto, Hirokazu Masataki, Osamu Yoshioka, Satoshi Takahashi:
Detection of anger emotion in dialog speech using prosody feature and temporal relation of utterances.
494-497
- Benjamin Roustan, Marion Dohen:
Gesture and speech coordination: the influence of the relationship between manual gesture and speech.
498-501
- Hynek Boril, Seyed Omid Sadjadi, Tristan Kleinschmidt, John H. L. Hansen:
Analysis and detection of cognitive load and frustration in drivers' speech.
502-505
- Akira Sasou, Yasuharu Hashimoto, Katsuhiko Sakaue:
Acoustic-based recognition of head gestures accompanying speech.
506-509
- Sandro Castronovo, Angela Mahr, Margarita Pentcheva, Christian A. Müller:
Multimodal dialog in the car: combining speech and turn-and-push dial to control comfort functions.
510-513
- Danil Korchagin, Philip N. Garner, Petr Motlícek:
Hands free audio analysis from home entertainment.
514-517
- Shaikh Mostafa Al Masum, Antonio Rui Ferreira Rebordão, Keikichi Hirose:
Affective story teller: a TTS system for emotional expressivity.
518-521
ASR:
Speaker Adaptation,
Robustness Against Reverberation
- Shweta Ghai, Rohit Sinha:
Enhancing children's speech recognition under mismatched condition by explicit acoustic normalization.
522-525
- Bo Li, Khe Chai Sim:
Comparison of discriminative input and output transformations for speaker adaptation in the hybrid NN/HMM systems.
526-529
- Ravichander Vipperla, Steve Renals, Joe Frankel:
Augmentation of adaptation data.
530-533
- Lukás Machlica, Zbynek Zajíc, Ludek Müller:
Discriminative adaptation based on fast combination of DMAP and dfMLLR.
534-537
- Doddipatla Rama Sanand, Ralf Schlüter, Hermann Ney:
Revisiting VTLN using linear transformation on conventional MFCC.
538-541
- Toyohiro Hayashi, Yoshihiko Nankaku, Akinobu Lee, Keiichi Tokuda:
Speaker adaptation based on nonlinear spectral transform for speech recognition.
542-545
- Tetsuo Kosaka, Takashi Ito, Masaharu Katoh, Masaki Kohda:
Speaker adaptation based on system combination using speaker-class models.
546-549
- Yongwon Jeong, Young Rok Song, Hyung Soon Kim:
Speaker adaptation in transformation space using two-dimensional PCA.
550-553
- Jan Trmal, Jan Zelinka, Ludek Müller:
On speaker adaptive training of artificial neural networks.
554-557
- Yongjun He, Jiqing Han:
Model synthesis for band-limited speech recognition.
558-561
- Takahiro Fukumori, Masanori Morise, Takanobu Nishiura:
Performance estimation of reverberant speech recognition based on reverberant criteria RSR-dn with acoustic parameters.
562-565
- Armin Sehr, Christian Hofmann, Roland Maas, Walter Kellermann:
A novel approach for matched reverberant training of HMMs using data pairs.
566-569
- Hari Krishna Maganti, Marco Matassoni:
An auditory based modulation spectral feature for reverberant speech recognition.
570-573
- Martin Wolf, Climent Nadeu:
On the potential of channel selection for recognition of reverberated speech with multiple microphones.
574-577
- Randy Gomez, Tatsuya Kawahara:
An improved wavelet-based dereverberation for robust automatic speech recognition.
578-581
- Rico Petrick, Thomas Fehér, Masashi Unoki, Rüdiger Hoffmann:
Methods for robust speech recognition in reverberant environments: a comparison.
582-585
Language Learning,
TTS,
and Other Applications
- Masayuki Suzuki, Yu Qiao, Nobuaki Minematsu, Keikichi Hirose:
Integration of multilayer regression analysis with structure-based pronunciation assessment.
586-589
- Joost van Doremalen, Catia Cucchiarini, Helmer Strik:
Using non-native error patterns to improve pronunciation verification.
590-593
- Dean Luo, Yu Qiao, Nobuaki Minematsu, Yutaka Yamauchi, Keikichi Hirose:
Regularized-MLLR speaker adaptation for computer-assisted language learning system.
594-597
- Kuniaki Hirabayashi, Seiichi Nakagawa:
Automatic evaluation of English pronunciation by Japanese speakers using various acoustic features and pattern recognition techniques.
598-601
- Hsien-Cheng Liao, Jiang-Chun Chen, Sen-Chia Chang, Ying-Hua Guan, Chin-Hui Lee:
Decision tree based tone modeling with corrective feedbacks for automatic Mandarin tone assessment.
602-605
- Jingli Lu, Ruili Wang, Liyanage C. De Silva, Yang Gao, Jia Liu:
CASTLE: a computer-assisted stress teaching and learning environment for learners of English as a second language.
606-609
- Shen Huang, Hongyan Li, Shijin Wang, Jiaen Liang, Bo Xu:
Automatic reference independent evaluation of prosody quality using multiple knowledge fusions.
610-613
- Su-Youn Yoon, Mark Hasegawa-Johnson, Richard Sproat:
Landmark-based automated pronunciation error detection.
614-617
- Zhiwei Shuang, Shiyin Kang, Yong Qin, Li-Rong Dai, Lianhong Cai:
HMM based TTS for mixed language text.
618-621
- Hui Liang, John Dines:
An analysis of language mismatch in HMM state mapping-based cross-lingual speaker adaptation.
622-625
- Tatsuya Kawahara, Norihiro Katsumaru, Yuya Akita, Shinsuke Mori:
Classroom note-taking system for hearing impaired students using automatic speech recognition adapted to lectures.
626-629
- Paul R. Dixon, Sadaoki Furui:
Exploring web-browser based runtimes engines for creating ubiquitous speech interfaces.
630-632
Pitch and Glottal-Waveform Estimation and Modeling I,
II
- Xuejing Sun, Sameer Gadre:
Efficient three-stage pitch estimation for packet loss concealment.
633-636
- Keiichi Funaki:
On evaluation of the f0 estimation based on time-varying complex speech analysis.
637-640
- Feng Huang, Tan Lee:
Pitch estimation in noisy speech based on temporal accumulation of spectrum peaks.
641-644
- Tianyu T. Wang, Thomas F. Quatieri:
Multi-pitch estimation by a joint 2-d representation of pitch and pitch dynamics.
645-648
- Pirros Tsiakoulis, Alexandros Potamianos:
On the effect of fundamental frequency on amplitude and frequency modulation patterns in speech resonances.
649-652
- M. Shahidur Rahman, Tetsuya Shimamura:
Pitch determination using autocorrelation function in spectral domain.
653-656
- Thomas Drugman, Thierry Dutoit:
Chirp complex cepstrum-based decomposition for asynchronous glottal analysis.
657-660
- Alan Ó Cinnéide, David Dorran, Mikel Gainza, Eugene Coyle:
Exploiting glottal formant parameters for glottal inverse filtering and parameterization.
661-664
- Nicolas Sturmel, Christophe d'Alessandro, Boris Doval:
Glottal parameters estimation on speech using the zeros of the z-transform.
665-668
- Sri Harish Reddy Mallidi, Kishore Prahallad, Suryakanth V. Gangashetty, B. Yegnanarayana:
Significance of pitch synchronous analysis for speaker recognition using AANN models.
669-672
- Gang Chen, Xue Feng, Yen-Liang Shue, Abeer Alwan:
On using voice source measures in automatic gender classification of children's speech.
673-676
- Wei Chu, Abeer Alwan:
SAFE: a statistical algorithm for F0 estimation for both clean and noisy speech.
2590-2593
- Jung Ook Hong, Patrick J. Wolfe:
Robust and efficient pitch estimation using an iterative ARMA technique.
2594-2597
- Yasunori Ohishi, Hirokazu Kameoka, Daichi Mochihashi, Hidehisa Nagano, Kunio Kashino:
Statistical modeling of F0 dynamics in singing voices based on Gaussian processes with multiple oscillation bases.
2598-2601
- Martin Heckmann, Claudius Gläser, Frank Joublin, Kazuhiro Nakadai:
Applying geometric source separation for improved pitch extraction in human-robot interaction.
2602-2605
- John Kane, Mark Kane, Christer Gobl:
A spectral LF model based approach to voice source parameterisation.
2606-2609
- Thomas Drugman, Thierry Dutoit:
Glottal-based analysis of the lombard effect.
2610-2613
Open Vocabulary Spoken Document Retrieval (Special Session)
- Yoshiaki Itoh, Hiromitsu Nishizaki, Xinhui Hu, Hiroaki Nanjo, Tomoyosi Akiba, Tatsuya Kawahara, Seiichi Nakagawa, Tomoko Matsui, Yoichi Yamashita, Kiyoaki Aikawa:
Constructing Japanese test collections for spoken term detection.
677-680
- Satoshi Natori, Hiromitsu Nishizaki, Yoshihiro Sekiguchi:
Japanese spoken term detection using syllable transition network derived from multiple speech recognizers' outputs.
681-684
- Sha Meng, Weiqiang Zhang, Jia Liu:
Combining Chinese spoken term detection systems via side-information conditioned linear logistic regression.
685-688
- Taisuke Kaneko, Tomoyosi Akiba:
Metric subspace indexing for fast spoken term detection.
689-692
- Chun-an Chan, Lin-Shan Lee:
Unsupervised spoken-term detection with spoken queries using segment-based dynamic time warping.
693-696
- Daniel Schneider, Timo Mertens, Martha Larson, Joachim Köhler:
Contextual verification for open vocabulary spoken term detection.
697-700
- Javier Tejedor, Doroteo Torre Toledano, Miguel Bautista, Simon King, Dong Wang, José Colás:
Augmented set of features for confidence estimation in spoken term detection.
701-704
- Xinhui Hu, Ryosuke Isotani, Hisashi Kawai, Satoshi Nakamura:
Cluster-based language model for spoken document retrieval using NMF-based document clustering.
705-708
Robust ASR
- Rogier C. van Dalen, Mark J. F. Gales:
Asymptotically exact noise-corrupted speech likelihoods.
709-712
- Ramón Fernandez Astudillo, Reinhold Orglmeister:
A MMSE estimator in mel-cepstral domain for robust large vocabulary automatic speech recognition using uncertainty propagation.
713-716
- Bhiksha Raj, Tuomas Virtanen, Sourish Chaudhuri, Rita Singh:
Non-negative matrix factorization based compensation of music for automatic speech recognition.
717-720
- Kris Demuynck, Xueru Zhang, Dirk Van Compernolle, Hugo Van Hamme:
Feature versus model based noise robustness.
721-724
- Ji Hun Park, Seon Man Kim, Jae Sam Yoon, Hong Kook Kim, Sung Joo Lee, Yunkeun Lee:
SNR-based mask compensation for computational auditory scene analysis applied to speech recognition in a car environment.
725-728
- Chanwoo Kim, Richard M. Stern, Kiwan Eom, Jaewon Lee:
Automatic selection of thresholds for signal separation algorithms based on interaural delay.
729-732
Language and Dialect Identification
- Florian Verdet, Driss Matrouf, Jean-François Bonastre, Jean Hennebert:
Channel detectors for system fusion in the context of NIST LRE 2009.
733-736
- Rong Tong, Bin Ma, Haizhou Li, Engsiong Chng:
Selecting phonotactic features for language recognition.
737-740
- Abualsoud Hanani, Michael J. Carey, Martin J. Russell:
Improved language recognition using mixture components statistics.
741-744
- Mikel Peñagarikano, Amparo Varona, Luis Javier Rodríguez-Fuentes, Germán Bordel:
Using cross-decoder co-occurrences of phone n-grams in SVM-based phonotactic language recognition.
745-748
- Oscar Koller, Alberto Abad, Isabel Trancoso, Céu Viana:
Exploiting variety-dependent phones in portuguese variety identification applied to broadcast news transcription.
749-752
- Fadi Biadsy, Julia Hirschberg, Michael Collins:
Dialect recognition using a phone-GMM-supervector-based SVM kernel.
753-756
Technologies for Learning and Education
- Xiaojun Qian, Frank K. Soong, Helen M. Meng:
Discriminative acoustic model for improving mispronunciation detection and diagnosis in computer-aided pronunciation training (CAPT).
757-760
- Liang-Yu Chen, Jyh-Shing Roger Jang:
Automatic pronunciation scoring using learning to rank and DP-based score segmentation.
761-764
- Wai Kit Lo, Shuang Zhang, Helen M. Meng:
Automatic derivation of phonological rules for mispronunciation detection in a computer-assisted pronunciation training system.
765-768
- Minh Duong, Jack Mostow:
Adapting a duration synthesis model to rate children's oral reading prosody.
769-772
- Su-Youn Yoon, Lei Chen, Klaus Zechner:
Predicting word accuracy for the automatic speech recognition of non-native speech.
773-776
- Taotao Zhu, Dengfeng Ke, Zhenbiao Chen, Bo Xu:
A new approach for automatic tone error detection in strong accented Mandarin based on dominant set.
777-780
Emotional Speech
- S. R. Mahadeva Prasanna, D. Govind:
Analysis of excitation source information in emotional speech.
781-784
- Dongrui Wu, Thomas D. Parsons, Shrikanth S. Narayanan:
Acoustic feature analysis in speech emotion primitives estimation.
785-788
- Lan-Ying Yeh, Tai-Shih Chi:
Spectro-temporal modulations for robust speech emotion recognition.
789-792
- Chi-Chun Lee, Matthew Black, Athanasios Katsamanis, Adam C. Lammert, Brian R. Baucom, Andrew Christensen, Panayiotis G. Georgiou, Shrikanth S. Narayanan:
Quantification of prosodic entrainment in affective spontaneous spoken interactions of married couples.
793-796
- Emily Mower, Kyu Jeong Han, Sungbok Lee, Shrikanth S. Narayanan:
A cluster-profile representation of emotion using agglomerative hierarchical clustering.
797-800
- Björn Schuller, Laurence Devillers:
Incremental acoustic valence recognition: an inter-corpus perspective on features, matching, and performance in a gating paradigm.
801-804
New Paradigms in ASR I,
II
- Xiaodong Wang, Kunihiko Owa, Makoto Shozakai:
Mandarin digit recognition assisted by selective tone distinction.
857-860
- Kazuhiko Abe, Sakriani Sakti, Ryosuke Isotani, Hisashi Kawai, Satoshi Nakamura:
Brazilian portuguese acoustic model training based on data borrowing from other language.
861-864
- Ngoc Thang Vu, Tim Schlippe, Franziska Kraus, Tanja Schultz:
Rapid bootstrapping of five eastern european languages using the rapid language adaptation toolkit.
865-868
- Houwei Cao, Tan Lee, P. C. Ching:
Cross-lingual speaker adaptation via Gaussian component mapping.
869-872
- Mohamed Elmahdy, Rainer Gruhn, Wolfgang Minker, Slim Abdennadher:
Cross-lingual acoustic modeling for dialectal Arabic speech recognition.
873-876
- Samuel Thomas, Sriram Ganapathy, Hynek Hermansky:
Cross-lingual and multi-stream posterior features for low resource LVCSR systems.
877-880
- Shiva Sundaram, Jerome R. Bellegarda:
Latent perceptual mapping: a new acoustic modeling framework for speech recognition.
881-884
- Richard Dufour, Fethi Bougares, Yannick Estève, Paul Deléglise:
Unsupervised model adaptation on targeted speech segments for LVCSR system combination.
885-888
- Irene Ayllón Clemente, Martin Heckmann, Alexander Denecke, Britta Wrede, Christian Goerick:
Incremental word learning using large-margin discriminative training and variance floor estimation.
889-892
- Tuomas Virtanen, Jort F. Gemmeke, Antti Hurmalainen:
State-based labelling for a sparse representation of speech and its application to robust speech recognition.
893-896
- Mirko Hannemann, Stefan Kombrink, Martin Karafiát, Lukas Burget:
Similarity scoring for recognizing repeated out-of-vocabulary words.
897-900
- Dino Seppi, Dirk Van Compernolle:
Data pruning for template-based automatic speech recognition.
901-904
- Man-Hung Siu, Herbert Gish, Arthur Chan, William Belfield:
Improved topic classification and keyword discovery using an HMM-based speech recognizer trained without supervision.
2838-2841
- Dimitri Kanevsky, Tara N. Sainath, Bhuvana Ramabhadran, David Nahamoo:
An analysis of sparseness and regularization in exemplar-based methods for speech classification.
2842-2845
- Abdel-rahman Mohamed, Dong Yu, Li Deng:
Investigation of full-sequence training of deep belief networks for speech recognition.
2846-2849
- Yow-Bang Wang, Lin-Shan Lee:
Mandarin tone recognition using affine-invariant prosodic features and tone posteriorgram.
2850-2853
- Geoffrey Zweig, Patrick Nguyen, Jasha Droppo, Alex Acero:
Continuous speech recognition with a TF-IDF acoustic model.
2854-2857
- Geoffrey Zweig, Patrick Nguyen:
SCARF: a segmental conditional random field toolkit for speech recognition.
2858-2861
Speech Production:
Various Approaches
- Akiko Amano-Kusumoto, John-Paul Hosom, Alexander Kain:
Speaking style dependency of formant targets.
905-908
- Tatsuya Kitamura:
Similarity of effects of emotions on the speech organ configuration with and without speaking.
909-912
- Daniel Bone, Samuel Kim, Sungbok Lee, Shrikanth S. Narayanan:
A study of intra-speaker and inter-speaker affective variability using electroglottograph and inverse filtered glottal waveforms.
913-916
- Ken-Ichi Sakakibara, Hiroshi Imagawa, Miwako Kimura, Hisayuki Yokonishi, Niro Tayama:
Modal analysis of vocal fold vibrations using laryngotopography.
917-920
- Martti Vainio, Matti Airas, Juhani Järvikivi, Paavo Alku:
Laryngeal voice quality in the expression of focus.
921-924
- Masako Fujimoto, Kikuo Maekawa, Seiya Funatsu:
Laryngeal characteristics during the production of geminate consonants.
925-928
- Julien Cisonni, Kazunori Nozaki, Annemie Van Hirtum, Shigeo Wada:
Numerical study of turbulent flow-induced sound production in presence of a tooth-shaped obstacle: towards sibilant [s] physical modeling.
929-932
- Iris Hanique, Barbara Schuppler, Mirjam Ernestus:
Morphological and predictability effects on schwa reduction: the case of dutch word-initial syllables.
933-936
- Samer Al Moubayed, G. Ananthakrishnan:
Acoustic-to-articulatory inversion based on local regression.
937-940
- Mirjam Broersma:
Korean lenis, fortis, and aspirated stops: effect of place of articulation on acoustic realization.
941-944
- Toru Nakashika, Ryuki Tachibana, Masafumi Nishimura, Tetsuya Takiguchi, Yasuo Ariki:
Speech synthesis by modeling harmonics structure with multiple function.
945-948
- Makoto Otani, Tatsuya Hirahara:
Physics of body-conducted silent speech - production, propagation and representation of non-audible murmur.
949-952
Speech Enhancement
- Subhojit Chakladar, Nam Soo Kim, Yu Gwang Jin, Tae Gyoon Kang:
Multichannel noise reduction using low order RTF estimate.
953-956
- Inho Lee, Jongsung Yoon, Yoonjae Lee, Hanseok Ko:
Reinforced blocking matrix with cross channel projection for speech enhancement.
957-960
- Ning Cheng, Wenju Liu, Lan Wang:
Masking property based microphone array post-filter design.
961-964
- Yusuke Sato, Tetsuya Hoya, Hovagim Bakardjian, Andrzej Cichocki:
Reduction of broadband noise in speech signals by multilinear subspace analysis.
965-968
- Jungpyo Hong, Seung Ho Han, Sangbae Jeong, Minsoo Hahn:
Novel probabilistic control of noise reduction for improved microphone array beamforming.
969-972
- Kai Li, Qiang Fu, Yonghong Yan:
Speech enhancement using improved generalized sidelobe canceller in frequency domain with multi-channel postfiltering.
973-976
- Jani Even, Carlos Toshinori Ishi, Hiroshi Saruwatari, Norihiro Hagita:
Close speaker cancellation for suppression of non-stationary background noise for hands-free speech interface.
977-980
- Ajay Srinivasamurthy, Thippur V. Sreenivas:
Multi-channel iterative dereverberation based on codebook constrained iterative multi-channel wiener filter.
981-984
- Anand Joseph Xavier Medabalimi, Sri Harish Reddy Mallidi, B. Yegnanarayana:
Speaker-dependent mapping of source and system features for enhancement of throat microphone speech.
985-988
- Jun Cai, Stefano Marini, Pierre Malarme, Francis Grenez, Jean Schoentgen:
An analytic modeling approach to enhancing throat microphone speech commands for keyword spotting.
989-992
- Stephen So, Kamil K. Wójcicki, Kuldip K. Paliwal:
Single-channel speech enhancement using kalman filtering in the modulation domain.
993-996
- Miao Yao, Weiqian Liang:
Integrated feedback and noise reduction algorithm in digital hearing aids via oscillation detection.
997-1000
- Charles Mercier, Roch Lefebvre:
A blind signal-to-noise ratio estimator for high noise speech recordings.
1001-1004
Special Session:
Fact and Replica of Speech Production (Special Session)
- Hiroshi Imagawa, Ken-Ichi Sakakibara, Isao T. Tokuda, Mamiko Otsuka, Niro Tayama:
Estimation of glottal area function using stereo-endoscopic high-speed digital imaging.
1005-1008
- Kazunori Nozaki, Youhei Ohnishi, Takashi Suda, Shigeo Wada, Shinji Shimojo:
Toward aero-acoustical analysis of the sibilant /s/: an oral cavity modeling.
1009-1012
- Kunitoshi Motoki:
Effects of wall impedance on transmission and attenuation of higher-order modes in vocal-tract model.
1013-1016
- Peter Birkholz, Bernd J. Kröger, Christiane Neuschaefer-Rube:
Articulatory synthesis and perception of plosive-vowel syllables with virtual consonant targets.
1017-1020
- Kotaro Fukui, Toshihiro Kusano, Yoshikazu Mukaeda, Yuto Suzuki, Atsuo Takanishi, Masaaki Honda:
Speech robot mimicking human articulatory motion.
1021-1024
- Takayuki Arai:
Mechanical vocal-tract models for speech dynamics.
1025-1028
- Michael C. Brady:
Prosodic timing analysis for articulatory re-synthesis using a bank of resonators with an adaptive oscillator.
1029-1032
ASR:
Language Modeling
- Ahmad Emami, Stanley F. Chen, Abraham Ittycheriah, Hagen Soltau, Bing Zhao:
Decoding with shrinkage-based language models.
1033-1036
- Stanley F. Chen, Stephen M. Chu:
Enhanced word classing for model M.
1037-1040
- Junho Park, Xunying Liu, Mark J. F. Gales, Philip C. Woodland:
Improved neural network based language modelling and adaptation.
1041-1044
- Tomas Mikolov, Martin Karafiát, Lukas Burget, Jan Cernocký, Sanjeev Khudanpur:
Recurrent neural network based language model.
1045-1048
- Preethi Jyothi, Eric Fosler-Lussier:
Discriminative language modeling using simulated ASR errors.
1049-1052
- Graham Neubig, Masato Mimura, Shinsuke Mori, Tatsuya Kawahara:
Learning a language model from continuous speech.
1053-1056
Single-Channel Speech Enhancement
Speech Synthesis:
Miscellaneous Topics
- Udochukwu Kalu Ogbureke, Peter Cahill, Julie Carson-Berndsen:
Hidden Markov models with context-sensitive observations for grapheme-to-phoneme conversion.
1105-1108
- Brian Langner, Stephan Vogel, Alan W. Black:
Evaluating a dialog language generation system: comparing the mountain system to other NLG approaches.
1109-1112
- Wesley Mattheyses, Lukas Latacz, Werner Verhelst:
Active appearance models for photorealistic visual speech synthesis.
1113-1116
- Jerome R. Bellegarda:
Latent affective mapping: a novel framework for the data-driven analysis of emotion in text.
1117-1120
- Anna C. Janska, Robert A. J. Clark:
Native and non-native speaker judgements on the quality of synthesized speech.
1121-1124
- Dominic Espinosa, Michael White, Eric Fosler-Lussier, Chris Brew:
Machine learning for text selection with expressive unit-selection voices.
1125-1128
Prosody:
Basics & Applications
- Alexei V. Ivanov, Giuseppe Riccardi, S. Ghosh, Sara Tonelli, Evgeny A. Stepanov:
Acoustic correlates of meaning structure in conversational speech.
1129-1132
- Nicolas Obin, Xavier Rodet, Anne Lacheret:
HMM-based prosodic structure model using rich linguistic context.
1133-1136
- Charlotte Wollermann, Bernhard Schröder, Ulrich Schade:
Audiovisual congruence and pragmatic focus marking.
1137-1140
- Margaret Zellers, Michele Gubian, Brechtje Post:
Redescribing intonational categories with functional data analysis.
1141-1144
- Shen Huang, Hongyan Li, Shijin Wang, Jiaen Liang, Bo Xu:
Exploring goodness of prosody by diverse matching templates.
1145-1148
- Mickael Rouvier, Richard Dufour, Georges Linarès, Yannick Estève:
A language-identification inspired method for spontaneous speech detection.
1149-1152
- Gérard Bailly, Amélie Lelong:
Speech dominoes and phonetic convergence.
1153-1156
- Mátyás Brendel, Riccardo Zaccarelli, Laurence Devillers:
A quick sequential forward floating feature selection algorithm for emotion detection from speech.
1157-1160
- Géza Kiss, Jan P. H. van Santen:
Automated vocal emotion recognition using phoneme class specific features.
1161-1164
- Adrian Pass, Jianguo Zhang, Darryl Stewart:
Feature selection for pose invariant lip biometrics.
1165-1168
- Hussein Hussein, Rüdiger Hoffmann:
Signal-based accent and phrase marking using the fujisaki model.
1169-1172
- Jangwon Kim, Sungbok Lee, Shrikanth S. Narayanan:
A study of interplay between articulatory movement and prosodic characteristics in emotional speech production.
1173-1176
ASR:
Feature Extraction I,
II
- Shang-wen Li, Liang-Che Sun, Lin-Shan Lee:
Improved phoneme recognition by integrating evidence from spectro-temporal and cepstral features.
1177-1180
- Suman V. Ravuri, Nelson Morgan:
Using spectro-temporal features to improve AFE feature extraction for ASR.
1181-1184
- Ibon Saratxaga, Inma Hernáez, Igor Odriozola, Eva Navas, Iker Luengo, Daniel Erro:
Using harmonic phase information to improve ASR rate.
1185-1188
- Kazumasa Yamamoto, Eiichi Sueyoshi, Seiichi Nakagawa:
Speech recognition using long-term phase information.
1189-1192
- Jan Zelinka, Jan Trmal, Ludek Müller:
Low-dimensional space transforms of posteriors in speech recognition.
1193-1196
- Christian Plahl, Ralf Schlüter, Hermann Ney:
Hierarchical bottle neck features for LVCSR.
1197-1200
- Frantisek Grézl, Martin Karafiát:
Hierarchical neural net architectures for feature extraction in ASR.
1201-1204
- Vivek Kumar Rangarajan Sridhar, Rohit Prasad, Prem Natarajan:
Mutual information analysis for feature and sensor subset selection in surface electromyography based speech recognition.
1205-1208
- Bernd T. Meyer, Birger Kollmeier:
Learning from human errors: prediction of phoneme confusions based on modified ASR training.
1209-1212
- Bo Li, Khe Chai Sim:
Hidden logistic linear regression for support vector machine based phone verification.
2614-2617
- Tim Ng, Bing Zhang, Long Nguyen:
Jointly optimized discriminative features for speech recognition.
2618-2621
- Florian Müller, Alfred Mertins:
Invariant integration features combined with speaker-adaptation methods.
2622-2625
- Mark Raugas, Vivek Kumar Rangarajan Sridhar, Rohit Prasad, Prem Natarajan:
Multi resolution discriminative models for subvocalic speech recognition.
2626-2629
- Fabio Valente, Mathew Magimai-Doss, Christian Plahl, Suman V. Ravuri, Wen Wang:
A comparative large scale study of MLP features for Mandarin ASR.
2630-2633
- Cong-Thanh Do, Dominique Pastor, Gaël Le Lan, André Goalic:
Recognizing cochlear implant-like spectrally reduced speech with HMM-based ASR: experiments with MFCCs and PLP coefficients.
2634-2637
Speech Perception:
Cross Language and Age
- Kazuhiro Kondo, Takayuki Kanda, Yosuke Kobayashi, Hiroyuki Yagyu:
Speech intelligibility of diagonally localized speech with competing noise using bone-conduction headphones.
1213-1216
- Pierre L. Divenyi:
Masking of vowel-analog transitions by vowel-analog distracters.
1217-1220
- François Pellegrino, Emmanuel Ferragne, Fanny Meunier:
2010, a speech oddity: phonetic transcription of reversed speech.
1221-1224
- Hsin-Yi Lin, Janice Fon:
Perception on pitch reset at discourse boundaries.
1225-1228
- Marjorie Dole, Michel Hoen, Fanny Meunier:
Effect of spatial separation on speech-in-noise comprehension in dyslexic adults.
1229-1232
- Ellen Marklund, Francisco Lacerda, Anna Ericsson:
Speech categorization context effects in seven- to nine-month-old infants.
1233-1236
- Diane Kewley-Port, Larry E. Humes, Daniel Fogerty:
Changes in temporal processing of speech across the adult lifespan.
1237-1240
- Jared Bernstein, Jian Cheng, Masanori Suzuki:
Fluency and structural complexity as predictors of L2 oral proficiency.
1241-1244
- Marco van de Ven, Benjamin V. Tucker, Mirjam Ernestus:
Semantic facilitation in bilingual everyday speech comprehension.
1245-1248
- Bo-ren Hsieh, Ho-hsien Pan:
L2 experience and non-native vowel categorization of L1-Mandarin speakers.
1249-1252
- Mirjam Wester:
Cross-lingual talker discrimination.
1253-1256
- Takashi Otake:
Dajare is not the lowest form of wit.
1257-1260
SLP Systems
- Rafael Torres, Shota Takeuchi, Hiromichi Kawanami, Tomoko Matsui, Hiroshi Saruwatari, Kiyohiro Shikano:
Comparison of methods for topic classification in a speech-oriented guidance system.
1261-1264
- Pere Comas, Jordi Turmo, Lluís Màrquez:
Using dependency parsing and machine learning for factoid question answering on spoken documents.
1265-1268
- Carolina Parada, Abhinav Sethy, Mark Dredze, Frederick Jelinek:
A spoken term detection framework for recovering out-of-vocabulary words using the web.
1269-1272
- Hung-yi Lee, Chia-Ping Chen, Ching-feng Yeh, Lin-Shan Lee:
Improved spoken term detection by discriminative training of acoustic models based on user relevance feedback.
1273-1276
- Sebastian Tschöpel, Daniel Schneider:
A lightweight keyword and tag-cloud retrieval algorithm for automatic speech recognition transcripts.
1277-1280
- Noboru Kanedera, Tetsuo Funada, Seiichi Nakagawa:
Lecture subtopic retrieval by retrieval keyword expansion using subordinate concept.
1281-1284
- Hiroaki Nanjo, Yusuke Iyonaga, Takehiko Yoshimi:
Spoken document retrieval for oral presentations integrating global document similarities into local document similarities.
1285-1288
- Joseph Polifroni, Stephanie Seneff:
Combining word-based features, statistical language models, and parsing for named entity recognition.
1289-1292
- Azeddine Zidouni, Sophie Rosset, Hervé Glotin:
Efficient combined approach for named entity recognition in spoken language.
1293-1296
- Sree Harsha Yella, Vasudeva Varma, Kishore Prahallad:
Prominence based scoring of speech segments for automatic speech-to-speech summarization.
1297-1300
- Zihan Liu, Lei Xie, Wei Feng:
Maximum lexical cohesion for fine-grained news story segmentation.
1301-1304
- Xiaoxuan Wang, Lei Xie, Bin Ma, Engsiong Chng, Haizhou Li:
Phoneme lattice based texttiling towards multilingual story segmentation.
1305-1308
Quality of Experiencing Speech Services (Special Session)
- Anton Schlesinger, Marinus M. Boone:
The characterization of the relative information content by spectral features for the objective intelligibility assessment of nonlinearly processed speech.
1309-1312
- Marcel Wältermann, Alexander Raake, Sebastian Möller:
Analytical assessment and distance modeling of speech transmission quality.
1313-1316
- Nicolas Côté, Vincent Koehl, Valérie Gautier-Turbin, Alexander Raake, Sebastian Möller:
An intrusive super-wideband speech quality model: DIAL.
1317-1320
- Sebastian Egger, Raimund Schatz, Stefan Scherer:
It takes two to tango - assessing the impact of delay on conversational interactivity on perceived speech quality.
1321-1324
- Sebastian Möller, Florian Hinterleitner, Tiago H. Falk, Tim Polzehl:
Comparison of approaches for instrumentally predicting the quality of text-to-speech systems.
1325-1328
- Imre Kiss, Joseph Polifroni, Chao Wang, Ghinwa F. Choueiter, Mike Phillips:
A hybrid architecture for mobile voice user interfaces.
1329-1332
- Markku Turunen, Jaakko Hakulinen, Tomi Heimonen:
Assessment of spoken and multimodal applications: lessons learned from laboratory and field studies.
1333-1336
- Klaus-Peter Engelbrecht, Hamed Ketabdar, Sebastian Möller:
Improving cross database prediction of dialogue quality using mixture of experts.
1337-1340
Language Processing
Speech and Audio Segmentation
- Sarah Hoffmann, Beat Pfister:
Fully automatic segmentation for prosodic speech corpora.
1389-1392
- Vahid Khanagha, Khalid Daoudi, Oriol Pont, Hussein M. Yahia:
A novel text-independent phonetic segmentation algorithm based on the microcanonical multiscale formalism.
1393-1396
- You-Yu Lin, Yih-Ru Wang, Yuan-Fu Liao:
Phone boundary detection using sample-based acoustic parameters.
1397-1400
- Utpala Musti, Asterios Toutios, Slim Ouni, Vincent Colotte, Brigitte Wrobel-Dautcourt, Marie-Odile Berger:
HMM-based automatic visual speech segmentation using facial data.
1401-1404
- David Wang, Robert Vogt, Sridha Sridharan:
Bayes factor based speaker segmentation for speaker diarization.
1405-1408
- Qiang Huang, Stephen J. Cox:
Using high-level information to detect key audio events in a tennis game.
1409-1412
Prosody:
Analysis
- Catherine Lai:
What do you mean, you're uncertain?: the interpretation of cue words and rising intonation in dialogue.
1413-1416
- Yi-Fen Liu, Shu-Chuan Tseng, Jyh-Shing Roger Jang, C.-H. Alvin Chen:
Coping imbalanced prosodic unit boundary detection with linguistically-motivated prosodic features.
1417-1420
- Zhigang Chen, Guoping Hu, Wei Jiang:
Improving prosodic phrase prediction by unsupervised adaptation and syntactic features extraction.
1421-1424
- Yujia Li, Tan Lee:
Perception-based automatic approximation of F0 contours in Cantonese speech.
1425-1428
- Raul Fernandez, Bhuvana Ramabhadran:
Discriminative training and unsupervised adaptation for labeling prosodic events with limited training data.
1429-1432
- Erin Cvejic, Jeesun Kim, Chris Davis, Guillaume Gibert:
Prosody for the eyes: quantifying visual prosody using guided principal component analysis.
1433-1436
Systems for LVCSR and Rich Transcription
- Naveen Parihar, Ralf Schlüter, David Rybach, Eric A. Hansen:
Parallel lexical-tree based LVCSR on multi-core processors.
1485-1488
- Jike Chong, Ekaterina Gonina, Kisun You, Kurt Keutzer:
Exploring recognition network representations for efficient speech inference on highly parallel platforms.
1489-1492
- Diamantino Caseiro:
WFST compression for automatic speech recognition.
1493-1496
- Ivan Bulyko:
Speech recognizer optimization under speed constraints.
1497-1500
- Florian Metze, Roger Hsiao, Qin Jin, Udhyakumar Nallasamy, Tanja Schultz:
The 2010 CMU GALE speech-to-text system.
1501-1504
- Tin Lay Nwe, Hanwu Sun, Bin Ma, Haizhou Li:
Speaker diarization in meeting audio for single distant microphone.
1505-1508
- Fernando Batista, Helena Moniz, Isabel Trancoso, Hugo Meinedo, Ana Isabel Mata, Nuno J. Mamede:
Extending the punctuation module for european portuguese.
1509-1512
- Sakriani Sakti, Ryosuke Isotani, Hisashi Kawai, Satoshi Nakamura:
Utilizing a noisy-channel approach for Korean LVCSR.
1513-1516
- Markus Nußbaum-Thom, Simon Wiesler, Martin Sundermeyer, Christian Plahl, Stefan Hahn, Ralf Schlüter, Hermann Ney:
The RWTH 2009 quaero ASR evaluation system for English and German.
1517-1520
Phonetics
- Benjamin Munson, Renata Solum:
When is indexical information about speech activated? evidence from a cross-modal priming experiment.
1521-1524
- Benjamin Munson:
The influence of actual and perceived sexual orientation on diadochokinetic rate in women and men.
1525-1528
- Kristine M. Yu:
Laryngealization and features for Chinese tonal recognition.
1529-1532
- Viet Son Nguyen, Eric Castelli, René Carré:
Production and perception of vietnamese short vowels in V1V2 context.
1533-1536
- Gertraud Fenk-Oczlon, August Fenk:
Measuring basic tempo across languages and some implications for speech rhythm.
1537-1540
- Yukari Hirata, Shigeaki Amano:
Durational structure of Japanese single/geminate stops in three- and four-mora words spoken at varied rates.
1541-1544
- Shin-ichiro Sano, Tomohiko Ooigawa:
Distribution and trichotomic realization of voiced velars in Japanese - an experimental study.
1545-1548
- Jagoda Sieczkowska, Bernd Möbius, Grzegorz Dogil:
Specification in context - devoicing processes in Polish, French, american English and German sonorants.
1549-1552
- Kuniko Nielsen:
Phonetic imitation of Japanese vowel devoicing.
1553-1556
- Mary Stevens, John Hajek:
Post-aspiration in standard Italian: some first cross-regional acoustic evidence.
1557-1560
- Mirko Grimaldi, Andrea Calabrese, Francesco Sigona, Luigina Garrapa, Bianca Sisinni:
Articulatory grounding of southern salentino harmony processes.
1561-1564
- Yuuki Tanida, Taiji Ueno, Satoru Saito, Matthew A. Lambon Ralph:
Effects of accent typicality and phonotactic frequency on nonword immediate serial recall performance in Japanese.
1565-1567
- Osamu Fujimura:
How abstract is phonetics?.
1568-1571
Speech Production:
Vocal Tract Modeling and Imaging
- Adam C. Lammert, Michael I. Proctor, Shrikanth S. Narayanan:
Data-driven analysis of realtime vocal tract MRI using correlated image regions.
1572-1575
- Michael I. Proctor, Daniel Bone, Athanasios Katsamanis, Shrikanth S. Narayanan:
Rapid semi-automatic segmentation of real-time magnetic resonance images for parametric vocal tract analysis.
1576-1579
- Yoon-Chul Kim, Shrikanth S. Narayanan, Krishna S. Nayak:
Improved real-time MRI of oral-velar coordination using a golden-ratio spiral view order.
1580-1583
- Erik Bresch, Athanasios Katsamanis, Louis Goldstein, Shrikanth S. Narayanan:
Statistical multi-stream modeling of real-time MRI articulatory speech data.
1584-1587
- G. Ananthakrishnan, Pierre Badin, Julián Andrés Valdés Vargas, Olov Engwall:
Predicting unseen articulations from multi-speaker articulatory models.
1588-1591
- Chao Qin, Miguel Á. Carreira-Perpiñán:
Estimating missing data sequences in x-ray microbeam recordings.
1592-1595
- Chao Qin, Miguel Á. Carreira-Perpiñán, Mohsen Farhadloo:
Adaptation of a tongue shape model by local feature transformations.
1596-1599
- Sungbok Lee, Shrikanth S. Narayanan:
Vocal tract contour analysis of emotional speech by the functional data curve representation.
1600-1603
- Adam C. Lammert, Louis Goldstein, Khalil Iskarous:
Locally-weighted regression for estimating the forward kinematics of a geometric vocal tract model.
1604-1607
- Michael Reimer, Frank Rudzicz:
Identifying articulatory goals from kinematic data using principal differential analysis.
1608-1611
- Zuheng Ming, Denis Beautemps, Gang Feng, Sébastien Schmerber:
Estimation of speech lip features from discrete cosinus transform.
1612-1615
- Farzaneh Ahmadi, Ian Vince McLoughlin, Hamid R. Sharifzadeh:
Autoregressive modelling for linear prediction of ultrasonic speech.
1616-1619
Speech Intelligibility Enhancement for All Ages,
Health Conditions and Environments (Special Session)
- Takayuki Arai, Nao Hodoshima:
Enhanced speech yielding higher intelligibility for all listeners and environments.
1620-1623
- Seyed Omid Sadjadi, Sanjay A. Patil, John H. L. Hansen:
Quality conversion of non-acoustic signals for facilitating human-to-human speech communication under harsh acoustic conditions.
1624-1627
- Keigo Nakamura, Tomoki Toda, Hiroshi Saruwatari, Kiyohiro Shikano:
The use of air-pressure sensor in electrolaryngeal speech enhancement based on statistical voice conversion.
1628-1631
- Gibak Kim, Philipos C. Loizou:
A new binary mask based on noise constraints for improved speech intelligibility.
1632-1635
- Yan Tang, Martin Cooke:
Energy reallocation strategies for speech enhancement in known noise conditions.
1636-1639
- Jing Chen, Thomas Baer, Brian C. J. Moore:
Effects of enhancement of spectral changes on speech quality and subjective speech intelligibility.
1640-1643
ASR:
Acoustic Model Adaptation
- Catherine Breslin, K. K. Chin, Mark J. F. Gales, Kate Knill, Haitian Xu:
Prior information for rapid speaker adaptation.
1644-1647
- Jonas Lööf, Ralf Schlüter, Hermann Ney:
Discriminative adaptation for log-linear acoustic models.
1648-1651
- Dimitra Vergyri, Lori Lamel, Jean-Luc Gauvain:
Automatic speech recognition of multiple accented English data.
1652-1655
- Jinyu Li, Yu Tsao, Chin-Hui Lee:
Shrinkage model adaptation in automatic speech recognition.
1656-1659
- Jinyu Li, Dong Yu, Yifan Gong, Li Deng:
Unscented transform with online distortion estimation for HMM adaptation.
1660-1663
- Michael L. Seltzer, Alex Acero:
HMM adaptation using linear spline interpolation with integrated spline parameter training for robust speech recognition.
1664-1667
SLP Systems for Information Extraction/Retrieval
- Dong Wang, Simon King, Nicholas W. D. Evans, Raphaël Troncy:
CRF-based stochastic pronunciation modeling for out-of-vocabulary spoken term detection.
1668-1671
- Chia-Ping Chen, Hung-yi Lee, Ching-feng Yeh, Lin-Shan Lee:
Improved spoken term detection by feature space pseudo-relevance feedback.
1672-1675
- Aren Jansen, Kenneth Church, Hynek Hermansky:
Towards spoken term discovery at scale with zero resources.
1676-1679
- Evandro B. Gouvêa, Tony Ezzat:
Vocabulary independent spoken query: a case for subword units.
1680-1683
- Shih-Hsiang Lin, Yao-Ming Yeh, Berlin Chen:
Extractive speech summarization - from the view of decision theory.
1684-1687
- Gabriel Murray, Giuseppe Carenini, Raymond T. Ng:
The impact of ASR on abstractive vs. extractive meeting summaries.
1688-1691
Speech Representation
- Li Deng, Michael L. Seltzer, Dong Yu, Alex Acero, Abdel-rahman Mohamed, Geoffrey E. Hinton:
Binary coding of speech spectrograms using a deep auto-encoder.
1692-1695
- Juhan Nam, Gautham J. Mysore, Joachim Ganseman, Kyogu Lee, Jonathan S. Abel:
A super-resolution spectrogram using coupled PLCA.
1696-1699
- Georgios Tzedakis, Yannis Pantazis, Olivier Rosec, Yannis Stylianou:
Fast least-squares solution for sinusoidal, harmonic and quasi-harmonic models.
1700-1703
- Afsaneh Asaei, Hervé Bourlard, Philip N. Garner:
Sparse component analysis for speech recognition in multi-speaker environment.
1704-1707
- Trond Skogstad, Torbjørn Svendsen:
Intra-frame variability as a predictor of frame classifiability.
1708-1711
- Tetsuya Shimamura, Ngoc Dinh Nguyen:
Autocorrelation and double autocorrelation based spectral representations for a noisy word recognition system.
1712-1715
Voice Conversion
- Elina Helander, Hanna Silén, Joaquín Míguez, Moncef Gabbouj:
Maximum a posteriori voice conversion using sequential monte carlo methods.
1716-1719
- Pierre Lanchantin, Xavier Rodet:
Dynamic model selection for spectral voice conversion.
1720-1723
- Takashi Nose, Takao Kobayashi:
Speaker-independent HMM-based voice conversion using quantized fundamental frequency.
1724-1727
- Daisuke Saito, Shinji Watanabe, Atsushi Nakamura, Nobuaki Minematsu:
Probabilistic integration of joint density model and speaker model for voice conversion.
1728-1731
- Zhi-Zheng Wu, Tomi Kinnunen, Engsiong Chng, Haizhou Li:
Text-independent F0 transformation with non-parallel data for voice conversion.
1732-1735
- Xiaodan Zhuang, Lijuan Wang, Frank K. Soong, Mark Hasegawa-Johnson:
A minimum converted trajectory error (MCTE) approach to high quality speech-to-lips conversion.
1736-1739
Prosody:
Language-Specific Models
- Anastasia Karlsson, David House, Jan-Olof Svantesson, Damrong Tayanin:
Influence of lexical tones on intonation in kammu.
1740-1743
- Satoshi Nambu, Yong-cheol Lee:
Phonetic realization of second occurrence focus in Japanese.
1744-1747
- Jianjing Kuang:
Prosodic grouping and relative clause disambiguation in Mandarin.
1748-1751
- Ya Li, Jianhua Tao, Meng Zhang, Shifeng Pan, Xiaoying Xu:
Text-based unstressed syllable prediction in Mandarin.
1752-1755
- Tomás Dubeda:
"flat pitch accents" in Czech.
1756-1759
- Tomás Dubeda:
Positional variability of pitch accents in Czech.
1760-1763
- Shyamal Das Mandal, Arup Saha, Tulika Basu, Keikichi Hirose, Hiroya Fujisaki:
Modeling of sentence-medial pauses in bangla readout speech: occurrence and duration.
1764-1767
- Adrian Leemann, Lucy Zuberbühler:
Declarative sentence intonation patterns in 8 swiss German dialects.
1768-1771
- Je Hun Jeon, Yang Liu:
Syllable-level prominence detection with acoustic evidence.
1772-1775
- Sankalan Prasad, Kalika Bali:
Prosody cues for classification of the discourse particle "hã" in hindi.
1776-1779
- Yuan Jia, Aijun Li:
Interaction of syntax-marked focus and wh-question induced focus in standard Chinese.
1780-1783
- Samer Al Moubayed, Jonas Beskow:
Prominence detection in Swedish using syllable correlates.
1784-1787
- Na Zhi, Daniel Hirst, Pier Marco Bertinetto:
Automatic analysis of the intonation of a tone language. applying the momel algorithm to spontaneous standard Chinese (beijing).
1788-1791
- Raymond W. M. Ng, Cheung-Chi Leung, Ville Hautamäki, Tan Lee, Bin Ma, Haizhou Li:
Towards long-range prosodic attribute modeling for language recognition.
1792-1795
- Robert Schubert, Oliver Jokisch, Diane Hirschfeld:
A modified parameterization of the Fujisaki model.
1796-1799
ASR:
Language Modeling and Speech Understanding I
- Saeedeh Momtazi, Friedrich Faubel, Dietrich Klakow:
Within and across sentence boundary language model.
1800-1803
- Ruhi Sarikaya, Stanley F. Chen, Abhinav Sethy, Bhuvana Ramabhadran:
Impact of word classing on shrinkage-based language models.
1804-1807
- Stanislas Oger, Vladimir Popescu, Georges Linarès:
Combination of probabilistic and possibilistic language models.
1808-1811
- Brandon Ballinger, Cyril Allauzen, Alexander Gruenstein, Johan Schalkwyk:
On-demand language model interpolation for mobile speech input.
1812-1815
- Tim Schlippe, Chenfei Zhu, Jan Gebhardt, Tanja Schultz:
Text normalization based on statistical machine translation and internet user support.
1816-1819
- Tanel Alumäe, Mikko Kurimo:
Efficient estimation of maximum entropy language models with n-gram features: an SRILM extension.
1820-1823
- Christian Gillot, Christophe Cerisara, David Langlois, Jean Paul Haton:
Similar n-gram language model.
1824-1827
- Markpong Jongtaveesataporn, Sadaoki Furui:
Topic and style-adapted language modeling for Thai broadcast news ASR.
1828-1831
- Ahmad Emami, Hong-Kwang Jeff Kuo, Imed Zitouni, Lidia Mangu:
Augmented context features for Arabic speech recognition.
1832-1835
- Lucía Ortega, Isabel Galiano, Lluís F. Hurtado, Emilio Sanchis, Encarna Segarra:
A statistical segment-based approach for spoken language understanding.
1836-1839
- Benjamin Lecouteux, Raphaël Rubino, Georges Linarès:
Improving back-off models with bag of words and hollow-grams.
2418-2421
- Ciprian Chelba, Thorsten Brants, Will Neveitt, Peng Xu:
Study on interaction between entropy pruning and kneser-ney smoothing.
2422-2425
- Hitoshi Yamamoto, Ken Hanazawa, Kiyokazu Miki, Koichi Shinoda:
Dynamic language model adaptation using keyword category classification.
2426-2429
- Welly Naptali, Masatoshi Tsuchiya, Seiichi Nakagawa:
Integration of cache-based model and topic dependent class model with soft clustering and soft voting.
2430-2433
- Frédéric Duvert, Renato de Mori:
Conditional models for detecting lambda-functions in a spoken language understanding system.
2434-2437
- Md. Akmal Haidar, Douglas D. O'Shaughnessy:
Novel weighting scheme for unsupervised language model adaptation using latent dirichlet allocation.
2438-2441
- Qun Feng Tan, Kartik Audhkhasi, Panayiotis G. Georgiou, Emil Ettelaie, Shrikanth S. Narayanan:
Automatic speech recognition system channel modeling.
2442-2445
- Takanobu Oba, Takaaki Hori, Atsushi Nakamura:
Round-robin discrimination model for reranking ASR hypotheses.
2446-2449
- Hasim Sak, Murat Saraclar, Tunga Güngör:
On-the-fly lattice rescoring for real-time automatic speech recognition.
2450-2453
First and Second Language Acquisition
- Angela Cooper, Yue Wang:
Cantonese tone word learning by tone and non-tone language speakers.
1840-1843
- Anne Cutler, Janise Shanley:
Validation of a training method for L2 continuous-speech segmentation.
1844-1847
- Jiahong Yuan:
Linguistic rhythm in foreign accent.
1848-1849
- Mee Sonu, Keiichi Tajima, Hiroaki Kato, Yoshinori Sagisaka:
The effect of a word embedded in a sentence and speaking rate variation on the perceptual training of geminate and singleton consonant distinction.
1850-1853
- Chiharu Tsurutani:
Foreign accent matters most when timing is wrong.
1854-1857
- Hyejin Hong, Jina Kim, Minhwa Chung:
Effects of Korean learners' consonant cluster reduction strategies on English speech recognition performance.
1858-1861
- June S. Levitt, William F. Katz:
The effects of EMA-based augmented visual feedback on the English speakers' acquisition of the Japanese flap: a perceptual study.
1862-1865
- Hinako Masuda, Takayuki Arai:
Perception of voiceless fricatives by Japanese listeners of advanced and intermediate level English proficiency.
1866-1869
- Lya Meister, Einar Meister:
Perception of estonian vowel categories by native and non-native speakers.
1870-1873
- Qin Shi, Kun Li, Shilei Zhang, Stephen M. Chu, Ji Xiao, ZhiJian Ou:
Spoken English assessment system for non-native speakers using acoustic and prosodic features.
1874-1877
- Elena E. Lyakso, Olga V. Frolova, Anna V. Kurazhova, Julia S. Gaikova:
Russian infants and children's sounds and speech corpuses for language acquisition studies.
1878-1881
- Julia Monnin, Hélène Loevenbruck:
Language-specific influence on phoneme development: French and drehu data.
1882-1885
- Jeffrey J. Holliday, Mary E. Beckman, Chanelle Mays:
Did you say susi or shushi? measuring the emergence of robust fricative contrasts in English- and Japanese-acquiring children.
1886-1889
Spoken Language Resources,
Systems and Evaluation I,
II
- Josef R. Novak, Paul R. Dixon, Sadaoki Furui:
An empirical comparison of the t3, juicer, HDecode and sphinx3 decoders.
1890-1893
- Philip N. Garner, John Dines:
Tracter: a lightweight dataflow framework.
1894-1897
- Marelie H. Davel, Febe de Wet:
Verifying pronunciation dictionaries using conflict analysis.
1898-1901
- Brandon Roy, Soroush Vosoughi, Deb Roy:
Automatic estimation of transcription accuracy and difficulty.
1902-1905
- Benjamin Lambert, Rita Singh, Bhiksha Raj:
Creating a linguistic plausibility dataset with non-expert annotators.
1906-1909
- Xinhui Hu, Ryosuke Isotani, Hisashi Kawai, Satoshi Nakamura:
Construction and evaluations of an annotated Chinese conversational corpus in travel domain for the language model of speech recognition.
1910-1913
- Thad Hughes, Kaisuke Nakajima, Linne Ha, Atul Vasu, Pedro J. Moreno, Mike LeBeau:
Building transcribed speech corpora quickly and cheaply for many languages.
1914-1917
- Heidi Christensen, Jon Barker, Ning Ma, Phil D. Green:
The CHiME corpus: a resource and a challenge for computational hearing in multisource environments.
1918-1921
- Wen Cao, Dongning Wang, Jinsong Zhang, Ziyu Xiong:
Developing a Chinese L2 speech database of Japanese learners with narrow-phonetic labels for computer assisted pronunciation training.
1922-1925
- Shogo Ishikawa, Shinya Kiriyama, Yoichi Takebayashi, Shigeyoshi Kitazawa:
How children acquire situation understanding skills?: a developmental analysis utilizing multimodal speech behavior corpus.
1926-1929
- Ina Wechsung, Stefan Schaffer, Robert Schleicher, Anja Naumann, Sebastian Möller:
The influence of expertise and efficiency on modality selection strategies and perceived mental effort.
1930-1933
- Christine Kühnel, Benjamin Weiss, Sebastian Möller:
Parameters describing multimodal interaction - definitions and three usage scenarios.
1934-1937
- Alexander Zgorzelski, Alexander Schmitt, Tobias Heinroth, Wolfgang Minker:
Repair strategies on trial: which error recovery do users like best?.
1938-1941
- Maryam Kamvar, Doug Beeferman:
Say what? why users choose to speak their web queries.
1966-1969
- Jonathan Teutenberg, Catherine I. Watson:
The effect of audience familiarity on the perception of modified accent.
1970-1973
- Korin Richmond, Robert A. J. Clark, Susan Fitt:
On generating combilex pronunciations via morphological analysis.
1974-1977
- Florian Gödde, Sebastian Möller:
Say it as you mean it - analyzing free user comments in the VOICE awards corpus.
1978-1981
- Viktor Rozgic, Bo Xiao, Athanasios Katsamanis, Brian R. Baucom, Panayiotis G. Georgiou, Shrikanth S. Narayanan:
A new multichannel multi modal dyadic interaction database.
1982-1985
- Dau-Cheng Lyu, Tien Ping Tan, Engsiong Chng, Haizhou Li:
SEAME: a Mandarin-English code-switching speech corpus in south-east asia.
1986-1989
Speech Production:
Analysis
- Daniel Felps, Christian Geng, Michael Berger, Korin Richmond, Ricardo Gutierrez-Osuna:
Relying on critical articulators to estimate vocal tract spectra in an articulatory-acoustic database.
1990-1993
- Vikram Ramanarayanan, Dani Byrd, Louis Goldstein, Shrikanth S. Narayanan:
Investigating articulatory setting - pauses, ready position, and rest - using real-time MRI.
1994-1997
- Chao Qin, Miguel Á. Carreira-Perpiñán:
Articulatory inversion of american English /turnr/ by conditional density modes.
1998-2001
- Atef Ben Youssef, Pierre Badin, Gérard Bailly:
Can tongue be recovered from face? the answer of data-driven statistical models.
2002-2005
- Francisco Torreira, Mirjam Ernestus:
Phrase-medial vowel devoicing in spontaneous French.
2006-2009
- Chierh Cheng, Yi Xu, Michele Gubian:
Exploring the mechanism of tonal contraction in taiwan Mandarin.
2010-2013
Paralanguage & Cognition
- Benjamin Weiss, Felix Burkhardt:
Voice attributes affecting likability perception.
2014-2017
- Kristiina Jokinen, Kazuaki Harada, Masafumi Nishida, Seiichi Yamamoto:
Turn-alignment using eye-gaze and speech in conversational interaction.
2018-2021
- Tet Fei Yap, Julien Epps, Eliathamby Ambikairajah, Eric H. C. Choi:
An investigation of formant frequencies for cognitive load classification.
2022-2025
- Martijn Goudbeek, Mirjam Broersma:
Language specific effects of emotion on phoneme duration.
2026-2029
- Matthew Black, Athanasios Katsamanis, Chi-Chun Lee, Adam C. Lammert, Brian R. Baucom, Andrew Christensen, Panayiotis G. Georgiou, Shrikanth S. Narayanan:
Automatic classification of married couples' behavior using audio features.
2030-2033
- Gideon Kowadlo, Patrick Ye, Ingrid Zukerman:
Influence of gestural salience on the interpretation of spoken requests.
2034-2037
Robust ASR Against Noise
- Vikramjit Mitra, Hosung Nam, Carol Y. Espy-Wilson, Elliot Saltzman, Louis Goldstein:
Robust word recognition using articulatory trajectories and gestures.
2038-2041
- Takeshi Yamada, Tomohiro Nakajima, Nobuhiko Kitawaki, Shoji Makino:
Performance estimation of noisy speech recognition considering recognition task complexity.
2042-2045
- Friedrich Faubel, Dietrich Klakow:
Estimating noise from noisy speech features with a monte carlo variant of the expectation maximization algorithm.
2046-2049
- Satoshi Tamura, Eriko Hishikawa, Wataru Taguchi, Satoru Hayamizu:
Template-based spectral estimation using microphone array for speech recognition.
2050-2053
- Aleem Mushtaq, Yu Tsao, Chin-Hui Lee:
A particle filter feature compensation approach to robust speech recognition.
2054-2057
- Chanwoo Kim, Richard M. Stern:
Nonlinear enhancement of onset for robust speech recognition.
2058-2061
- Shirin Badiezadegan, Richard C. Rose:
Mask estimation in non-stationary noise environments for missing feature based robust speech recognition.
2062-2065
- Lae-Hoon Kim, Kyung-Tae Kim, Mark Hasegawa-Johnson:
Robust automatic speech recognition with decoder oriented ideal binary mask estimation.
2066-2069
- Gökhan Ince, Kazuhiro Nakadai, Tobias Rodemann, Hiroshi Tsujino, Jun-ichi Imura:
A robust speech recognition system against the ego noise of a robot.
2070-2073
- Kuo-Hao Wu, Chia-Ping Chen:
Empirical mode decomposition for noise-robust automatic speech recognition.
2074-2077
- Wooil Kim, Jun-Won Suh, John H. L. Hansen:
An effective feature compensation scheme tightly matched with speech recognizer employing SVM-based GMM generation.
2078-2081
- Jort F. Gemmeke, Tuomas Virtanen:
Artificial and online acquired noise dictionaries for noise robust ASR.
2082-2085
- Akira Saito, Yoshihiko Nankaku, Akinobu Lee, Keiichi Tokuda:
Voice activity detection based on conditional random fields using multiple features.
2086-2089
- Yong Zhao, Biing-Hwang Juang:
A comparative study of noise estimation algorithms for VTS-based robust speech recognition.
2090-2093
- Frank Seide, Pei Zhao:
On using missing-feature theory with cepstral features - approximations to the multivariate integral.
2094-2097
- Yang Sun, Jort F. Gemmeke, Bert Cranen, Louis ten Bosch, Lou Boves:
Using a DBN to integrate sparse classification and GMM-based ASR.
2098-2101
Voice Conversion and Speech Synthesis
- Axel Röbel:
Shape-invariant speech transformation with the phase vocoder.
2146-2149
- Kayoko Yanagisawa, Mark Huckvale:
A phonetic alternative to cross-language voice conversion in a text-dependent context: evaluation of speaker identity.
2150-2153
- Esther Klabbers, Alexander Kain, Jan P. H. van Santen:
Evaluation of speaker mimic technology for personalizing SGD voices.
2154-2157
- Kumi Ohta, Tomoki Toda, Yamato Ohtani, Hiroshi Saruwatari, Kiyohiro Shikano:
Adaptive voice-quality control based on one-to-many eigenvoice conversion.
2158-2161
- Fernando Villavicencio, Jordi Bonada:
Applying voice conversion to concatenative singing-voice synthesis.
2162-2165
- Miaomiao Wang, Miaomiao Wen, Keikichi Hirose, Nobuaki Minematsu:
Improved generation of fundamental frequency in HMM-based speech synthesis using generation process model.
2166-2169
- Ming Lei, Yi-Jian Wu, Frank K. Soong, Zhen-Hua Ling, Li-Rong Dai:
A hierarchical F0 modeling method for HMM-based speech synthesis.
2170-2173
- Javier Latorre, Mark J. F. Gales, Heiga Zen:
Training a parametric-based logF0 model with the minimum generation error criterion.
2174-2177
- Miaomiao Wen, Miaomiao Wang, Keikichi Hirose, Nobuaki Minematsu:
Improving Mandarin segmental duration prediction with automatically extracted syntax features.
2178-2181
- Daniel R. van Niekerk, Etienne Barnard:
An intonation model for TTS in sepedi.
2182-2185
- Michael Pucher, Dietmar Schabus, Junichi Yamagishi:
Synthesis of fast speech with interpolation of adapted HSMMs and its evaluation by blind and sighted listeners.
2186-2189
- Gabriel Webster, Sacha Krstulovic, Kate Knill:
A comparison of pronunciation modeling approaches for HMM-TTS.
2190-2193
- Zhen-Hua Ling, Korin Richmond, Junichi Yamagishi:
HMM-based text-to-articulatory-movement prediction and analysis of critical articulators.
2194-2197
Detection,
Classification,
and Segmentation
- Jiaxing Ye, Takumi Kobayashi, Tetsuya Higuchi:
Audio-based sports highlight detection by fourier local auto-correlations.
2198-2201
- Hynek Boril, Abhijeet Sangwan, Taufiq Hasan, John H. L. Hansen:
Automatic excitement-level detection for sports highlights generation.
2202-2205
- Jörg-Hendrik Bach, Jörn Anemüller:
Detecting novel objects in acoustic scenes through classifier incongruence.
2206-2209
- Stavros Ntalampiras, Ilyas Potamitis, Nikos Fakotakis:
A multidomain approach for automatic home environmental sound classification.
2210-2213
- Patrick Cardinal, Vishwa Gupta, Gilles Boulianne:
Content-based advertisement detection.
2214-2217
- Stavros Ntalampiras, Ilyas Potamitis, Nikos Fakotakis:
Identification of abnormal audio events based on probabilistic novelty detection.
2218-2221
- Norbert Braunschweiler, Mark J. F. Gales, Sabine Buchholz:
Lightly supervised recognition for automatic alignment of large coherent speech recordings.
2222-2225
- Oshry Ben-Harush, Itshak Lapidot, Hugo Guterman:
Incremental diarization of telephone conversations.
2226-2229
- Srikanth Cherla, V. Ramasubramanian:
Audio analytics by template modeling and 1-pass DP based decoding.
2230-2233
- Mariusz Ziólko, Jakub Galka, Bartosz Ziólko, Tomasz Drwiega:
Perceptual wavelet decomposition for speech segmentation.
2234-2237
- Venkatesh Keri, Kishore Prahallad:
A comparative study of constrained and unconstrained approaches for segmentation of speech signal.
2238-2241
- Morgan Sonderegger, Joseph Keshet:
Automatic discriminative measurement of voice onset time.
2242-2245
- Yi Ren Leng, Tran Huy Dat, Norihide Kitaoka, Haizhou Li:
Selective gammatone filterbank feature for robust sound event recognition.
2246-2249
Compressive Sensing for Speech and Language Processing (Special Session)
- Allen Y. Yang, Zihan Zhou, Yi Ma, Shankar Sastry:
Towards a robust face recognition system using compressive sensing.
2250-2253
- Tara N. Sainath, Bhuvana Ramabhadran, David Nahamoo, Dimitri Kanevsky, Abhinav Sethy:
Sparse representation features for speech recognition.
2254-2257
- Abhinav Sethy, Tara N. Sainath, Bhuvana Ramabhadran, Dimitri Kanevsky:
Data selection for language modeling using sparse representations.
2258-2261
- Jort F. Gemmeke, Ulpu Remes, Kalle J. Palomäki:
Observation uncertainty measures for sparse imputation.
2262-2265
- Tara N. Sainath, Sameer Maskey, Dimitri Kanevsky, Bhuvana Ramabhadran, David Nahamoo, Julia Hirschberg:
Sparse representations for text categorization.
2266-2269
- Garimella S. V. S. Sivaram, Sriram Ganapathy, Hynek Hermansky:
Sparse auto-associative neural networks: theory and application to speech recognition.
2270-2273
ASR:
Lexical and Pronunciation Modeling
- Chi Hu, Xiaodan Zhuang, Mark Hasegawa-Johnson:
FSM-based pronunciation modeling using articulatory phonological code.
2274-2277
- Denis Jouvet, Dominique Fohr, Irina Illina:
Detailed pronunciation variant modeling for speech transcription.
2278-2281
- Line Adde, Bert Réveil, Jean-Pierre Martens, Torbjørn Svendsen:
A minimum classification error approach to pronunciation variation modeling of non-native proper names.
2282-2285
- Antoine Laurent, Sylvain Meignier, Téva Merlin, Paul Deléglise:
Acoustics-based phonetic transcription method for proper nouns.
2286-2289
- Tim Schlippe, Sebastian Ochs, Tanja Schultz:
Wiktionary as a source for automatic pronunciation extraction.
2290-2293
- Ibrahim Badr, Ian McGraw, James R. Glass:
Learning new word pronunciations from spoken examples.
2294-2297
Speaker Recognition and Diarization
- I-Fan Chen, Shih-Sian Cheng, Hsin-Min Wang:
Phonetic subspace mixture model for speaker diarization.
2298-2301
- Martin Zelenák, Carlos Segura, Javier Hernando:
Overlap detection for speaker diarization by fusing spectral and spatial features.
2302-2305
- Alfred Dielmann, Giulia Garau, Hervé Bourlard:
Floor holder detection and end of speaker turn prediction in meetings.
2306-2309
- Carlos Vaquero, Alfonso Ortega, Jesús A. Villalba, Antonio Miguel, Eduardo Lleida:
Confidence measures for speaker segmentation and their relation to speaker verification.
2310-2313
- Anthony Larcher, Christophe Lévy, Driss Matrouf, Jean-François Bonastre:
Decoupling session variability modelling and speaker characterisation.
2314-2317
- Cheung-Chi Leung, Donglai Zhu, Kong-Aik Lee, Bin Ma, Haizhou Li:
Incorporating MAP estimation and covariance transform for SVM based speaker recognition.
2318-2321
Speech and Audio Classification
- Stéphane Rossignol, Olivier Pietquin:
Single-speaker/multi-speaker co-channel speech classification.
2322-2325
- Oriol Vinyals, Gerald Friedland, Nelson Morgan:
Discriminative training for hierarchical clustering in speaker diarization.
2326-2329
- Jürgen T. Geiger, Frank Wallhoff, Gerhard Rigoll:
GMM-UBM based open-set online speaker diarization.
2330-2333
- Ladan Golipour, Douglas D. O'Shaughnessy:
A segment-based non-parametric approach for monophone recognition.
2334-2337
- Taras Butko, Climent Nadeu:
A fast one-pass-training feature selection technique for GMM-based acoustic event detection with audio-visual data.
2338-2341
- Nobuhide Yamakawa, Tetsuro Kitahara, Toru Takahashi, Kazunori Komatani, Tetsuya Ogata, Hiroshi G. Okuno:
Effects of modelling within- and between-frame temporal variations in power spectra on non-verbal sound recognition.
2342-2345
Emotion Recognition
- Ling He, Margaret Lech, Nicholas Allen:
On the importance of glottal flow spectral energy for the recognition of emotions in speech.
2346-2349
- Laurence Devillers, Christophe Vaudable, Clément Chastagnol:
Real-life emotion-related states detection in call centers: a cross-corpora study.
2350-2353
- Ali Hassan, Robert I. Damper:
Multi-class and hierarchical SVMs for emotion recognition.
2354-2357
- David Philippou-Hübner, Bogdan Vlasenko, Tobias Grosser, Andreas Wendemuth:
Determining optimal features for emotion recognition from speech by applying an evolutionary algorithm.
2358-2361
- Martin Wöllmer, Angeliki Metallinou, Florian Eyben, Björn Schuller, Shrikanth S. Narayanan:
Context-sensitive multimodal emotion recognition from speech and facial expression using bidirectional LSTM modeling.
2362-2365
- Kartik Audhkhasi, Shrikanth S. Narayanan:
Data-dependent evaluator modeling and its application to emotional valence classification from speech.
2366-2369
Speech Coding,
Modeling,
and Transmission
- Zhanyu Ma, Arne Leijon:
Modelling speech line spectral frequencies with dirichlet mixture models.
2370-2373
- Zhanyu Ma, Arne Leijon:
PDF-optimized LSF vector quantization based on beta mixture models.
2374-2377
- Jose Enrique Garcia, Alfonso Ortega, Antonio Miguel, Eduardo Lleida:
Non-linear predictive vector quantization of feature vectors for distributed speech recognition.
2378-2381
- Lasse Laaksonen, Mikko Tammi, Vladimir Malenovsky, Tommy Vaillancourt, Mi Suk Lee, Tomofumi Yamanashi, Masahiro Oshikiri, Claude Lamblin, Balázs Kövesi, Lei Miao, Deming Zhang, Jon Gibbs, Holly Francois:
Superwideband extension of g.718 and g.729.1 speech codecs.
2382-2385
- José L. Carmona, Angel M. Gomez, Antonio M. Peinado, José L. Pérez-Córdoba, José A. González:
A multipulse FEC scheme based on amplitude estimation for CELP codecs over packet networks.
2386-2389
- Anssi Rämö, Henri Toukomaa:
Voice quality evaluation of recent open source codecs.
2390-2393
- Bengt J. Borgström, Per Henrik Borgström, Abeer Alwan:
Efficient HMM-based estimation of missing features, with applications to packet loss concealment.
2394-2397
- Xiaoqiang Xiao, Robert M. Nickel:
Speech inventory based discriminative training for joint speech enhancement and low-rate speech coding.
2398-2401
- Qipeng Gong, Peter Kabal:
Quality-based playout buffering with FEC for conversational voIP.
2402-2405
- Masatsune Tamura, Takehiko Kagoshima, Masami Akamine:
Sub-band basis spectrum model for pitch-synchronous log-spectrum and phase based on approximation of sparse coding.
2406-2409
- Sundar Harshavardhan, Chandra Sekhar Seelamantula, Thippur V. Sreenivas:
A multimodal density function estimation approach to formant tracking.
2410-2413
- Heikki Rasilo, Unto K. Laine, Okko Johannes Räsänen:
Estimation studies of vocal tract shape trajectory using a variable length and lossy kelly-lochbaum model.
2414-2417
Speech Perception:
Processing and Intelligibility
- Serajul Haque, Roberto Togneri:
A feature extraction method for automatic speech recognition based on the cochlear nucleus.
2454-2457
- Samuel Thomas, Kailash Patil, Sriram Ganapathy, Nima Mesgarani, Hynek Hermansky:
A phoneme recognition framework based on auditory spectro-temporal receptive fields.
2458-2461
- Amy V. Beeston, Guy J. Brown:
Perceptual compensation for effects of reverberation in speech identification: a computer model based on auditory efferent processing.
2462-2465
- Barbara Schuppler, Mirjam Ernestus, Wim A. van Dommelen, Jacques C. Koreman:
Predicting human perception and ASR classification of word-final [t] by its acoustic sub-segmental properties.
2466-2469
- Matthew Robertson, Guy J. Brown, Wendy Lecluyse, Manasa Panda, Christine M. Tan:
A speech-in-noise test based on spoken digits: comparison of normal and impaired listeners using a computer model.
2470-2473
- Takayuki Kagomiya, Seiji Nakagawa:
Evaluation of bone-conducted ultrasonic hearing-aid regarding transmission of paralinguistic information: a comparison with cochlear implant simulator.
2474-2477
- Tim Jürgens, Stefan Fredelake, Ralf M. Meyer, Birger Kollmeier, Thomas Brand:
Challenging the speech intelligibility index: macroscopic vs. microscopic prediction of sentence recognition in normal and hearing-impaired listeners.
2478-2481
- Verena N. Uslar, Thomas Brand, Mirko Hanke, Rebecca Carroll, Esther Ruigendijk, Cornelia Hamann, Birger Kollmeier:
Does sentence complexity interfere with intelligibility in noise? evaluation of the oldenburg linguistically and audiologically controlled sentence test (OLACS).
2482-2485
- Juan-Pablo Ramirez, Hamed Ketabdar, Alexander Raake:
Intelligibility predictions for speech against fluctuating masker.
2486-2489
- Masashi Ito, Keiji Ohara, Akinori Ito, Masafumi Yano:
An effect of formant amplitude in vowel perception.
2490-2493
- Christopher I. Petkov, Benjamin Wilson:
Functional imaging of brain regions sensitive to communication sounds in primates.
2494-2497
Spoken Language Understanding and Spoken Language Translation I,
II
- Ye-Yi Wang:
Strategies for statistical spoken language understanding with small amount of data - an empirical study.
2498-2501
- Bassam Jabaian, Laurent Besacier, Fabrice Lefèvre:
Investigating multiple approaches for SLU portability to a new language.
2502-2505
- Anja Austermann, Seiji Yamada, Kotaro Funakoshi, Mikio Nakano:
Learning naturally spoken commands for a robot.
2506-2509
- Amparo Albalate, Aparna Suchindranath, David Suendermann, Wolfgang Minker:
A semi-supervised cluster-and-label approach for utterance classification.
2510-2513
- Silvia Quarteroni, Giuseppe Riccardi:
Classifying dialog acts in human-human and human-machine spoken conversations.
2514-2517
- Fei Liu, Yang Liu:
Exploring speaker characteristics for meeting summarization.
2518-2521
- Shasha Xie, Hui Lin, Yang Liu:
Semi-supervised extractive speech summarization via co-training algorithm.
2522-2525
- Asli Çelikyilmaz, Dilek Hakkani-Tür:
Extractive summarization using a latent variable model.
2526-2529
- Emil Ettelaie, Panayiotis G. Georgiou, Shrikanth S. Narayanan:
Hierarchical classification for speech-to-speech translation.
2530-2533
- Matthias Paulik, Alex Waibel:
Rapid development of speech translation using consecutive interpretation.
2534-2537
- Sameer Maskey, Steven J. Rennie, Bowen Zhou:
Combining many alignments for speech to speech translation.
2538-2541
- Pierre Gotab, Géraldine Damnati, Frédéric Béchet, Lionel Delphin-Poulat:
Online SLU model adaptation with a partial oracle.
2862-2865
- Om Deshmukh, Harish Doddala, Ashish Verma, Karthik Visweswariah:
Role of language models in spoken fluency evaluation.
2866-2869
- Sibel Yaman, Dilek Hakkani-Tür, Gökhan Tür:
Social role discovery from spoken language using dynamic Bayesian networks.
2870-2873
- Michelle Hewlett Sanchez, Gökhan Tür, Luciana Ferrer, Dilek Hakkani-Tür:
Domain adaptation and compensation for emotion detection.
2874-2877
- Sankaranarayanan Ananthakrishnan, Rohit Prasad, Prem Natarajan:
Phrase alignment confidence for statistical machine translation.
2878-2881
- Ian R. Lane, Alex Waibel:
Named-entity projection and data-driven morphological decomposition for field maintainable speech-to-speech translation systems.
2882-2885
Social Signals in Speech (Special Session)
- Paul M. Brunet, Marcela Charfuelan, Roderick Cowie, Marc Schröder, Hastings Donnan, Ellen Douglas-Cowie:
Detecting Politeness and efficiency in a cooperative social interaction.
2542-2545
- Nick Campbell, Stefan Scherer:
Comparing measures of synchrony and alignment in dialogue speech timing with respect to turn-taking activity.
2546-2549
- Emina Kurtic, Guy J. Brown, Bill Wells:
Resources for turn competition in overlap in multi-party conversations: speech rate, pausing and duration.
2550-2553
- Khiet P. Truong, Dirk Heylen:
Disambiguating the functions of conversational sounds with prosody: the case of yeah.
2554-2557
- Marcela Charfuelan, Marc Schröder, Ingmar Steiner:
Prosody and voice quality of vocal social signals: the case of dominance in scenario meetings.
2558-2561
- Daniel Neiberg, Joakim Gustafson:
The prosody of Swedish conversational grunts.
2562-2565
Physiology and Pathology of Spoken Language
- Christophe Mertens, Francis Grenez, Lise Crevier-Buchman, Jean Schoentgen:
Reliable tracking based on speech sample salience of vocal cycle length perturbations.
2566-2569
- Hideki Kasuya, Hajime Yoshida, Satoshi Ebihara, Hiroki Mori:
Longitudinal changes of selected voice source parameters.
2570-2573
- Ali Alpan, Jean Schoentgen, Youri Maryn, Francis Grenez:
Automatic perceptual categorization of disordered connected speech.
2574-2577
- Heejin Kim, Panying Rong, Torrey M. Loucks, Mark Hasegawa-Johnson:
Kinematic analysis of tongue movement control in spastic dysarthria.
2578-2581
- Irene Jacobi, Lisette van der Molen, Maya van Rossum, Frans Hilgers:
Pre- and short-term posttreatment vocal functioning in patients with advanced head and neck cancer treated with concomitant chemoradiotherapy.
2582-2585
- Joan K. Y. Ma, Rüdiger Hoffmann:
Acoustic analysis of intonation in parkinson's disease.
2586-2589
Speaker Diarization
- Carlos Vaquero, Oriol Vinyals, Gerald Friedland:
A hybrid approach to online speaker diarization.
2638-2641
- Simon Bozonnet, Nicholas W. D. Evans, Xavier Anguera, Oriol Vinyals, Gerald Friedland, Corinne Fredouille:
System output combination for improved speaker diarization.
2642-2645
- Simon Bozonnet, Nicholas W. D. Evans, Corinne Fredouille, Dong Wang, Raphaël Troncy:
An integrated top-down/bottom-up approach to speaker diarization.
2646-2649
- Deepu Vijayasenan, Fabio Valente, Hervé Bourlard:
Advances in fast multistream diarization based on the information bottleneck framework.
2650-2653
- Giulia Garau, Alfred Dielmann, Hervé Bourlard:
Audio-visual synchronisation for speaker diarisation.
2654-2657
- Kyu Jeong Han, Shrikanth S. Narayanan:
An improved cluster model selection method for agglomerative hierarchical speaker clustering using incremental Gaussian mixture models.
2658-2661
- Nigel G. Ward, Olac Fuentes, Alejandro Vega:
Dialog prediction for a general model of turn-taking.
2662-2665
- Tobias Herbig, Franz Gerl, Wolfgang Minker:
Speaker tracking in an unsupervised speech controlled system.
2666-2669
- Paula Lopez-Otero, Laura Docío Fernández, Carmen García-Mateo:
MultiBIC: an improved speaker segmentation technique for TV shows.
2670-2673
Multi-Modal ASR,
Including Audio-Visual ASR
- John-Paul Hosom, Tom Jakobs, Allen Baker, Susan Fager:
Automatic speech recognition for assistive writing in speech supplemented word prediction.
2674-2677
- Alexey Karpov, Andrey Ronzhin, Konstantin Markov, Milos Zelezný:
Viseme-dependent weight optimization for CHMM-based audio-visual speech recognition.
2678-2681
- Louis H. Terry, Karen Livescu, Janet B. Pierrehumbert, Aggelos K. Katsaggelos:
Audio-visual anticipatory coarticulation modeling by human and machine.
2682-2685
- Matthias Janke, Michael Wand, Tanja Schultz:
Impact of lack of acoustic feedback in EMG-based silent speech recognition.
2686-2689
- Chong-Jia Ni, Wenju Liu, Bo Xu:
Using prosody to improve Mandarin automatic speech recognition.
2690-2693
- Satoshi Tamura, Masato Ishikawa, Takashi Hashiba, Shin'ichi Takeuchi, Satoru Hayamizu:
A robust audio-visual speech recognition using audio-visual voice activity detection.
2694-2697
- Dorothea Kolossa, Jike Chong, Steffen Zeiler, Kurt Keutzer:
Efficient manycore CHMM speech recognition for audiovisual and multistream data.
2698-2701
- Takami Yoshida, Kazuhiro Nakadai:
Two-layered audio-visual integration in voice activity detection and automatic speech recognition for robots.
2702-2705
- Panikos Heracleous, Norihiro Hagita:
Non-audible murmur recognition based on fusion of audio and visual streams.
2706-2709
Speaker and Language Recognition
- Mohamed Faouzi BenZeghiba, Jean-Luc Gauvain, Lori Lamel:
Improved n-gram phonotactic models for language recognition.
2710-2713
- Sirinoot Boonsuk, Donglai Zhu, Bin Ma, Atiwong Suchato, Proadpran Punyabukkana, Nattanun Thatphithakkul, Chai Wutiwiwatchai:
A study of term weighting in phonotactic approach to spoken language recognition.
2714-2717
- Sabato Marco Siniscalchi, Jeremy Reed, Torbjørn Svendsen, Chin-Hui Lee:
Exploiting context-dependency and acoustic resolution of universal speech attribute models in spoken language recognition.
2718-2721
- David Imseng, Mathew Magimai-Doss, Hervé Bourlard:
Hierarchical multilayer perceptron based language identification.
2722-2725
- Alvin F. Martin, Craig S. Greenberg:
The NIST 2010 speaker recognition evaluation.
2726-2729
- Shih-Sian Cheng, I-Fan Chen, Hsin-Min Wang:
Bayesian speaker recognition using Gaussian mixture model and laplace approximation.
2730-2733
- Tomi Kinnunen, Rahim Saeidi, Johan Sandberg, Maria Hansson-Sandsten:
What else is new than the hamming window? robust MFCCs for speaker recognition via multitapering.
2734-2737
- Achintya Kumar Sarkar, Srinivasan Umesh:
Fast computation of speaker characterization vector using MLLR and sufficient statistics in anchor model framework.
2738-2741
- Zahi N. Karam, William M. Campbell:
Graph-embedding for speaker recognition.
2742-2745
- Chang Huai You, Haizhou Li, Kong-Aik Lee:
A hybrid modeling strategy for GMM-SVM speaker recognition with adaptive relevance factor.
2746-2749
- Sundar Harshavardhan, Thippur V. Sreenivas:
Robust mixture modeling using t-distribution: application to speaker ID.
2750-2753
- Chi-Sang Jung, Kyu Jeong Han, Hyunson Seo, Shrikanth S. Narayanan, Hong-Goo Kang:
A variable frame length and rate algorithm based on the spectral kurtosis measure for speaker verification.
2754-2757
Source Localization and Separation
- Kohei Hayashida, Masanori Morise, Takanobu Nishiura:
Near field sound source localization based on cross-power spectrum phase analysis with multiple microphones.
2758-2761
- Jinho Choi, Chang D. Yoo:
A maximum a posteriori sound source localization in reverberant and noisy conditions.
2762-2765
- Tomohiro Nakatani, Shoko Araki, Takuya Yoshioka, Masakiyo Fujimoto:
Multichannel source separation based on source location cue with log-spectral shaping by hidden Markov source model.
2766-2769
- Duc Thanh Chau, Junfeng Li, Masato Akagi:
A DOA estimation algorithm based on equalization-cancellation theory.
2770-2773
- Tania Habib, Harald Romsdorfer:
Concurrent speaker localization using multi-band position-pitch (m-popi) algorithm with spectro-temporal pre-processing.
2774-2777
- Ji-Hyun Song, Kyu-Ho Lee, Yun-Sik Park, Sang-Ick Kang, Joon-Hyuk Chang:
On using Gaussian mixture model for double-talk detection in acoustic echo suppression.
2778-2781
- Cemil Demir, A. Taylan Cemgil, Murat Saraclar:
Catalog-based single-channel speech-music separation.
2782-2785
- Ke Hu, DeLiang Wang:
Unvoiced speech segregation based on CASA and spectral subtraction.
2786-2789
- Ke Hu, DeLiang Wang:
Unsupervised sequential organization for cochannel speech separation.
2790-2793
INTERSPEECH 2010 Paralinguistic Challenge (Special Session)
- Björn Schuller, Stefan Steidl, Anton Batliner, Felix Burkhardt, Laurence Devillers, Christian A. Müller, Shrikanth S. Narayanan:
The INTERSPEECH 2010 paralinguistic challenge.
2794-2797
- Florian Lingenfelser, Johannes Wagner, Thurid Vogt, Jonghwa Kim, Elisabeth André:
Age and gender classification from speech using decision level fusion and ensemble based techniques.
2798-2801
- Je Hun Jeon, Rui Xia, Yang Liu:
Level of interest sensing in spoken dialog using multi-level fusion of acoustic and lexical evidence.
2802-2805
- Phuoc Nguyen, Trung Le, Dat Tran, Xu Huang, Dharmendra Sharma:
Fuzzy support vector machines for age and gender classification.
2806-2809
- Rok Gajsek, Janez Zibert, Tadej Justin, Vitomir Struc, Bostjan Vesnicer, France Mihelic:
Gender and affect recognition based on GMM and GMM-UBM modeling with relevance MAP estimation.
2810-2813
- Royi Porat, Dan Lange, Yaniv Zigel:
Age recognition based on speech signals using weights supervector.
2814-2817
- Hugo Meinedo, Isabel Trancoso:
Age and gender classification using fusion of acoustic and prosodic features.
2818-2821
- Marcel Kockmann, Lukas Burget, Jan Cernocký:
Brno university of technology system for interspeech 2010 paralinguistic challenge.
2822-2825
- Ming Li, Chi-Sang Jung, Kyu Jeong Han:
Combining five acoustic level modeling methods for automatic speaker age and gender recognition.
2826-2829
- Tobias Bocklet, Georg Stemmer, Viktor Zeißler, Elmar Nöth:
Age and gender recognition based on multiple systems - early vs. late fusion.
2830-2833
- Michael Feld, Felix Burkhardt, Christian A. Müller:
Automatic speaker age and gender recognition in the car for tailoring dialog and mobile services.
2834-2837
Signal Processing for Music and Song
- Kiyoaki Aikawa, Junko Uenuma, Tomoko Akitake:
Acoustic correlates of voice quality improvement by voice training.
2886-2889
- Minghui Dong, Paul Y. Chan, Ling Cen, Haizhou Li, Jason Teo, Ping Jen Kua:
Phonetic segmentation of singing voice using MIDI and parallel speech.
2890-2893
- Keijiro Saino, Makoto Tachibana, Hideki Kenmochi:
A singing style modeling system for singing voice synthesizers.
2894-2897
- Jingzhou Yang, Jia Liu, Weiqiang Zhang:
A fast query by humming system based on notes.
2898-2901
- Seokhwan Jo, Sihyun Joo, Chang D. Yoo:
Melody pitch estimation based on range estimation and candidate extraction using harmonic structure model.
2902-2905
- Jihoon Park, Kwang-Ki Kim, Jeongil Seo, Minsoo Hahn:
Modified spatial audio object coding scheme with harmonic extraction and elimination structure for interactive audio service.
2906-2909
Modeling First Language Acquisition
- Christina Bergmann, Michele Gubian, Lou Boves:
Modelling the effect of speaker familiarity and noise on infant word recognition.
2910-2913
- Kouki Miyazawa, Hideaki Kikuchi, Reiko Mazuka:
Unsupervised learning of vowels from continuous speech based on self-organized phoneme acquisition model.
2914-2917
- Andrew R. Plummer, Mary E. Beckman, Mikhail Belkin, Eric Fosler-Lussier, Benjamin Munson:
Learning speaker normalization using semisupervised manifold alignment.
2918-2921
- Okko Johannes Räsänen:
Fully unsupervised word learning from continuous speech using transitional probabilities of atomic acoustic events.
2922-2925
- Louis ten Bosch, Lou Boves:
Language acquisition and cross-modal associations: computational simulation of the result of infant studies.
2926-2929
- Maarten Versteegh, Louis ten Bosch, Lou Boves:
Active word learning under uncertain input conditions.
2930-2933
Discourse and Dialogue
- Rémi Lavalley, Chloé Clavel, Patrice Bellot, Marc El-Bèze:
Combining text categorization and dialog modeling for speaker role identification on call center conversations.
3062-3065
- Akira Nakamura, Satoru Hayamizu:
Topic-dependent n-gram models based on optimization of context lengths in LDA.
3066-3069
- Nicolas Obin, Volker Dellwo, Anne Lacheret, Xavier Rodet:
Expectations for discourse genre identification: a prosodic study.
3070-3073
- Ramón Granell, Stephen G. Pulman, Carlos D. Martínez-Hinarejos, José-Miguel Benedí:
Dialogue act tagging and segmentation with a single perceptron.
3074-3077
- Yasuhisa Fujii, Kazumasa Yamamoto, Seiichi Nakagawa:
Improving the readability of class lecture ASR results using a confusion network.
3078-3081
Voice Activity and Turn Detection
- Sang-Kyun Kim, Jae-Hun Choi, Sang-Ick Kang, Ji-Hyun Song, Joon-Hyuk Chang:
Toward detecting voice activity employing soft decision in second-order conditional MAP.
3082-3085
- Xugang Lu, Masashi Unoki, Ryosuke Isotani, Hisashi Kawai, Satoshi Nakamura:
Voice activity detection in a reguarized reproducing kernel hilbert space.
3086-3089
- Ji Wu, Xiao-Lei Zhang, Wei Li:
A new VAD framework using statistical model and human knowledge based empirical rule.
3090-3093
- Mark C. Huggins, Brett Y. Smolenski, Aaron D. Lawson:
Adaptive high accuracy approaches to speech activity detection in noisy and hostile audio environments.
3094-3097
- Prasanta Kumar Ghosh, Andreas Tsiartas, Panayiotis G. Georgiou, Shrikanth S. Narayanan:
Robust voice activity detection in stereo recording with crosstalk.
3098-3101
- Masakiyo Fujimoto, Shinji Watanabe, Tomohiro Nakatani:
Voice activity detection using frame-wise model re-estimation method based on Gaussian pruning with weight normalization.
3102-3105
- Bowon Lee, Debargha Muhkerjee:
Spectral entropy-based voice activity detector for videoconferencing systems.
3106-3109
- David Dean, Sridha Sridharan, Robert Vogt, Michael Mason:
The QUT-NOISE-TIMIT corpus for the evaluation of voice activity detection algorithms.
3110-3113
- Tao Yu, John H. L. Hansen:
A Bayesian approach to voice activity detection using multiple statistical models and discriminative training.
3114-3117
- Houman Ghaemmaghami, Brendan Baker, Robert Vogt, Sridha Sridharan:
Noise robust voice activity detection using features extracted from the time-domain autocorrelation function.
3118-3121
- Tasuku Oonishi, Koji Iwano, Sadaoki Furui:
VAD-measure-embedded decoder with online model adaptation.
3122-3125
- Shiwen Deng, Jiqing Han:
Robust statistical voice activity detection using a likelihood ratio sign test.
3126-3129
- Alexei V. Ivanov, Giuseppe Riccardi:
Automatic turn segmentation in spoken conversations.
3130-3133
- Yohei Kawaguchi, Masahito Togami, Yasunari Obuchi:
Turn taking-based conversation detection by using DOA estimation.
3134-3137
Last update Fri May 25 08:23:14 2012
CET by the DBLP Team —
Data released under the ODC-BY 1.0 license — See also our legal information page