INTERSPEECH 2007:
Antwerp,
Belgium
INTERSPEECH 2007, 8th Annual Conference of the International Speech Communication Association, Antwerp, Belgium, August 27-31, 2007.
ISCA 2007
Keynotes 1-4
Discriminative and Large Margin Techniques in Acoustic Modeling
- Jinyu Li, Chin-Hui Lee:
Soft margin feature extraction for automatic speech recognition.
30-33
- Yan Yin, Hui Jiang:
A fast optimization method for large margin estimation of HMMs based on second order cone programming.
34-37
- Hao-Zheng Li, Douglas D. O'Shaughnessy:
Frame margin probability discriminative training algorithm for noisy speech recognition.
38-41
- Fabio Valente, Jithendra Vepa, Christian Plahl, Christian Gollan, Hynek Hermansky, Ralf Schlüter:
Hierarchical neural networks feature extraction for LVCSR system.
42-45
- Peder A. Olsen, John R. Hershey:
Bhattacharyya error and divergence using variational importance sampling.
46-49
- Tingyao Wu, Jacques Duchateau, Dirk Van Compernolle:
Phoneme dependent frame selection preference.
50-53
Speech Production I,
II
- Xinhui Zhou, Carol Y. Espy-Wilson, Mark Tiede, Suzanne Boyce:
An articulatory and acoustic study of "retroflex" and "bunched" american English rhotic sound based on MRI.
54-57
- Paula Martins, Inês Carbone, Augusto Silva, António J. S. Teixeira:
An MRI study of european portuguese nasals.
58-61
- Sayoko Takano, Hiroki Matsuzaki, Kunitoshi Motoki:
A four-cube FEM model of the extrinsic and intrinsic tongue muscles to simulate the production of vowel /i/.
62-65
- Juan F. Torres, Elliot Moore:
Performance evaluation of glottal quality measures from the perspective of vocal tract filter consistency.
66-69
- Veena D. Singampalli, Philip J. B. Jackson:
Statistical identification of critical, dependent and redundant articulators.
70-73
- Chao Qin, Miguel Á. Carreira-Perpiñán:
An empirical investigation of the nonuniqueness in the acoustic-to-articulatory mapping.
74-77
Phonetic Segmentation and Classification I,
II
- Peter Karsmakers, Kristiaan Pelckmans, Johan A. K. Suykens, Hugo Van hamme:
Fixed-size kernel logistic regression for phoneme classification.
78-81
- Seung Seop Park, Jong Won Shin, Jong Kyu Kim, Nam Soo Kim:
A multiple-model based framework for automatic speech segmentation.
82-85
- Aren Jansen, Partha Niyogi:
Semi-supervised learning of speech sounds.
86-89
- Abhinav Parate, Ashish Verma, Jayanta Basak:
Evaluation of syllable stress using single class classifier.
90-93
- Mohammad Nurul Huda, Muhammad Ghulam, Junsei Horikawa, Tsuneo Nitta:
Distinctive phonetic feature (DPF) based phone segmentation using hybrid neural networks.
94-97
- J.-Ph. Goldman, Mathieu Avanzi, Anne-Catherine Simon, Anne Lacheret, A. Auchlin:
A methodology for the automatic detection of perceived prominent syllables in spoken French.
98-101
Discourse,
Dialog and Conversation
Spoken Dialog Systems I,
II
- Craig Wootton, Michael F. McTear, Terry Anderson:
Utilizing online content as domain knowledge in a multi-domain dynamic dialogue system.
122-125
- Boris W. van Schooten, Sophie Rosset, Olivier Galibert, Aurélien Max, Rieks op den Akker, Gabriel Illouz:
Handling speech input in the ritel QA dialogue system.
126-129
- Woosung Kim:
Online call quality monitoring for automating agent-based call centers.
130-133
- Sebastian Möller, Klaus-Peter Engelbrecht, Antti Oulasvirta:
Analysis of communication failures for spoken dialogue systems.
134-137
- Sandra Mann, André Berton, Ute Ehrlich:
How to access audio files of large data bases using in-car speech dialogue systems.
138-141
- Kazunori Komatani, Tatsuya Kawahara, Hiroshi G. Okuno:
Analyzing temporal transition of real user's behaviors in a spoken dialogue system.
142-145
- J. Sherwani, Dong Yu, Tim Paek, Mary Czerwinski, Yun-Cheng Ju, Alex Acero:
Voicepedia: towards speech-based access to unstructured information.
146-149
- Vivek Kumar Rangarajan Sridhar, Srinivas Bangalore, Shrikanth S. Narayanan:
Exploiting prosodic features for dialog act tagging in a discriminative modeling framework.
150-153
- Hua Ai, Antonio Roque, Anton Leuski, David R. Traum:
Using information state to improve dialogue move identification in a spoken dialogue system.
154-157
- Shiu-Wah Chu, Ian M. O'Neill, Philip Hanna:
Using multiple strategies to manage spoken dialogue.
158-161
- Marcelo Quinderé, Luís Seabra Lopes, António J. S. Teixeira:
An information state based dialogue manager for a mobile robot.
162-165
Accent and Language Identification I,
II
- Josef G. Bauer, Bernt Andrassy, Ekaterina Timoshenko:
Discriminative optimization of language adapted HMMs for a language identification system based on parallel phoneme recognizers.
166-169
- Khe Chai Sim, Haizhou Li:
Fusion of contrastive acoustic models for parallel phonotactic spoken language identification.
170-173
- Liang Wang, Eliathamby Ambikairajah, Eric H. C. Choi:
Multi-layer kohonen self-organizing feature map for language identification.
174-177
- Bo Yin, Eliathamby Ambikairajah, Fang Chen:
Hierarchical language identification based on automatic language clustering.
178-181
- Ekaterina Timoshenko, Harald Höge:
Using speech rhythm for acoustic language identification.
182-185
- Kakeung Wong, Man-Hung Siu, Brian Mak:
A model-based estimation of phonotactic language verification performance.
186-189
- Mike Rosner, Paulseph-John Farrugia:
A tagging algorithm for mixed language identification in a noisy domain.
190-193
- Doroteo Torre Toledano, Javier Gonzalez-Dominguez, Alejandro Abejón-Gonzalez, Danilo Spada, Ismael Mateos-Garcia, Joaquin Gonzalez-Rodriguez:
Improved language recognition using better phonetic decoders and fusion with MFCC and SDC features.
194-197
Education and Training
- Daniel Bolaños, Wayne Ward, Sarel van Vuuren, Javier Garrido:
Syllable lattices as a basis for a children's speech reading tracker.
198-201
- Fuping Pan, Qingwei Zhao, Yonghong Yan:
Mandarin vowel pronunciation quality evaluation by using formant pattern recognition.
202-205
- Matthew Black, Joseph Tepperman, Sungbok Lee, Patti Price, Shrikanth S. Narayanan:
Automatic detection and classification of disfluent reading miscues in young children's speech for the purpose of assessment.
206-209
- Nobuaki Minematsu, K. Kamata, Satoshi Asakawa, T. Makino, Tazuko Nishimura, Keikichi Hirose:
Structural assessment of language learners' pronunciation.
210-213
- Abdurrahman Samir, Sherif Mahdy Abdou, Ahmed Husien Khalil, Mohsen Rashwan:
Enhancing usability of CAPL system for qur'an recitation learning.
214-217
- Febe de Wet, Christa van der Walt, Thomas Niesler:
Automatic large-scale oral language proficiency assessment.
218-221
Robust ASR I,
II
- Yuki Denda, Takamasa Tanaka, Masato Nakayama, Takanobu Nishiura, Yoichi Yamashita:
Noise-robust hands-free voice activity detection with adaptive zero crossing detection using talker direction estimation.
222-225
- Agustín Álvarez Marquina, Rafael Martínez, Pedro Gómez Vilda, Victor Nieto Lluis, V. Rodellar:
A robust mel-scale subband voice activity detector for a car platform.
226-229
- Kentaro Ishizuka, Tomohiro Nakatani, Masakiyo Fujimoto, Noboru Miyazaki:
Noise robust front-end processing with voice activity detection based on periodic to aperiodic component ratio.
230-233
- A. M. Toh, Roberto Togneri, Sven Nordholm:
Feature and distribution normalization schemes for statistical mismatch reduction in reverberant speech recognition.
234-237
- Matthew Gibson, Thomas Hain:
Temporal masking for unsupervised minimum Bayes risk speaker adaptation.
238-241
- Tsung-hsueh Hsieh, Jeih-Weih Hung:
Speech feature compensation based on pseudo stereo codebooks for robust speech recognition in additive noise environments.
242-245
- Dimitrios Dimitriadis, Petros Maragos, Stamatios Lefkimmiatis:
Multiband, multisensor robust features for noisy speech recognition.
246-249
- Akira Sasou, Hiroaki Kojima:
Noise robust speech recognition for voice driven wheelchair.
250-253
Adaptation in ASR I,
II
- Yun Tang, Richard C. Rose:
Clustered maximum likelihood linear basis for rapid speaker adaptation.
254-257
- Wen Xuan Teng, Guillaume Gravier, Frédéric Bimbot, Frédéric Soufflet:
Rapid speaker adaptation by reference model interpolation.
258-261
- Randy Gomez, Tomoki Toda, Hiroshi Saruwatari, Kiyohiro Shikano:
Rapid unsupervised speaker adaptation using single utterance based on MLLR and speaker selection.
262-265
- Brian Kan-Wing Mak, Roger Wend-Huu Hsiao:
Robustness of several kernel-based fast adaptation methods on noisy LVCSR.
266-269
- Janne Pylkkönen:
Estimating VTLN warping factors by distribution matching.
270-273
- Ming Liu, Xi Zhou, Mark Hasegawa-Johnson, Thomas S. Huang, Zhengyou Zhang:
Frequency domain correspondence for speaker normalization.
274-277
- Masafumi Nishida, Yasuo Horiuchi, Akira Ichikawa:
Unsupervised training of adaptation rate using q-learning in large vocabulary continuous speech recognition.
278-281
- Martin Karafiát, Lukás Burget, Jan Cernocký, Thomas Hain:
Application of CMLLR in narrow band wide band adapted systems.
282-285
- Christophe Lévy, Georges Linarès, Jean-François Bonastre:
Fast adaptation of GMM-based compact models.
286-289
Speaker Verification & Identification I-IV
- Zahi N. Karam, William M. Campbell:
A new kernel for SVM MLLR based speaker recognition.
290-293
- Kong-Aik Lee, Changhuai You, Haizhou Li, Tomi Kinnunen:
A GMM-based probabilistic sequence kernel for speaker verification.
294-297
- Hagai Aronowitz:
Speaker recognition using kernel-PCA and intersession variability modeling.
298-301
- Réda Dehak, Najim Dehak, Patrick Kenny, Pierre Dumouchel:
Linear and non linear kernel GMM supervector machines for speaker verification.
302-305
- Ignacio Lopez-Moreno, Ismael Mateos-Garcia, Daniel Ramos, Joaquin Gonzalez-Rodriguez:
Support vector regression for speaker verification.
306-309
- Chris Longworth, Mark J. F. Gales:
Derivative and parametric kernels for speaker verification.
310-313
Spoken Data Retrieval I,
II
- David R. H. Miller, Michael Kleber, Chia-Lin Kao, Owen Kimball, Thomas Colthurst, Stephen A. Lowe, Richard M. Schwartz, Herbert Gish:
Rapid and accurate spoken term detection.
314-317
- Yi-Cheng Pan, Hung-lin Chang, Berlin Chen, Lin-Shan Lee:
Subword-based position specific posterior lattices (s-PSPL) for indexing speech information.
318-321
- Andreas Merkel, Dietrich Klakow:
Improved methods for language model based question classification.
322-325
- Tomoyosi Akiba, Hirofumi Tsujimura:
Error-tolerant question answering for spoken documents.
326-329
- Dilek Z. Hakkani-Tür, Gökhan Tür, Michael Levit:
Exploiting information extraction annotations for document retrieval in distillation tasks.
330-333
- Kishan Thambiratnam, Frank Seide:
Learning spoken document similarity and recommendation using supervised probabilistic latent semantic analysis.
334-337
Accent and Language Identification I,
II
- David A. van Leeuwen, Khiet P. Truong:
An open-set detection evaluation methodology applied to language and emotion recognition.
338-341
- Xi Yang, Man-Hung Siu, Herbert Gish, Brian Mak:
Boosting with anti-models for automatic language identification.
342-345
- Fabio Castaldo, Daniele Colibro, Emanuele Dalmasso, Pietro Laface, Claudio Vair:
Acoustic language identification using fast discriminative training.
346-349
- Ming Li, Hongbin Suo, Xiao Wu, Ping Lu, Yonghong Yan:
Spoken language identification using score vector modeling and support vector machine.
350-353
- Ricardo de Córdoba, Luis Fernando D'Haro, Fernando F. Fernández-Martínez, Javier Macías Guarasa, Javier Ferreiros:
Language identification based on n-gram frequency ranking.
354-357
- Wade Shen, Douglas A. Reynolds:
Improving phonotactic language recognition with acoustic adaptation.
358-361
Speech Perception I,
II
- Michael C. W. Yip:
Spoken word recognition of Chinese homophones: a further investigation.
362-365
- Maria Wolters, Pauline Campbell, Christine DePlacido, Amy Liddell, David Owens:
The role of outer hair cell function in the perception of synthetic versus natural speech.
366-369
- Akiko Kusumoto, Alexander Kain, John-Paul Hosom, Jan P. H. van Santen:
Hybridizing conversational and clear speech.
370-373
- Sophie Dufour, Ulrich H. Frauenfelder:
Neighborhood density and neighborhood frequency effects in French spoken word recognition.
374-377
- Toshio Irino, Yoshie Aoki, Yoshie Hayashi, Hideki Kawahara, Roy D. Patterson:
Discrimination and recognition of scaled word sounds.
378-381
- László Tóth:
Benchmarking human performance on the acoustic and linguistic subtasks of ASR systems.
382-385
- Lin Yang, Jianping Zhang, Yonghong Yan:
Contributions of temporal fine structure cues to Chinese speech recognition in cochlear implant simulation.
386-389
- Xihong Wu, Jing Chen, Zhigang Yang, Qiang Huang, Mengyuan Wang, Liang Li:
Effect of number of masking talkers on speech-on-speech masking in Chinese.
390-393
- Odile Bagou, Sophie Dufour, Cécile Fougeron, Alain Content, Ulrich H. Frauenfelder:
Do different boundary types induce subtle acoustic cues to which French listeners are sensitive?
394-397
- Svante Stadler, Arne Leijon, Björn Hagerman:
An information theoretic approach to predict speech intelligibility for listeners with normal and impaired hearing.
398-401
- Travis Wade, Bernd Möbius:
Speaking rate effects in a landmark-based phonetic exemplar model.
402-405
- Kazumi Maniwa, Allard Jongman, Travis Wade:
Acoustic correlates of intelligibility enhancements in clearly produced fricatives.
406-409
- Tim Jürgens, Thomas Brand, Birger Kollmeier:
Modelling the human-machine gap in speech reception: microscopic speech intelligibility prediction for normal-hearing subjects with an auditory model.
410-413
- Ayako Ikeno, John H. L. Hansen:
Lombard speech impact on perceptual speaker recognition.
414-417
- Huiwen Goy, Kathleen Pichora-Fuller, Pascal van Lieshout, Gurjit Singh, Bruce Schneider:
Effect of within- and between-talker variability on word identification in noise by younger and older adults.
418-421
- H. Timothy Bunnell, N. Carolyn Schanen, Linda D. Vallino, Thierry G. Morlet, James B. Polikoff, Jennette D. Driscoll, James T. Mantell:
Speech perception in children with speech sound disorder.
422-425
- Huan Wang, Werner Hemmert:
Speech coding and information processing by auditory neurons.
426-429
- Annie C. Gilbert, Victor J. Boucher:
What do listeners attend to in hearing prosodic structures? investigating the human speech-parser using short-term recall.
430-433
Prosody:
Prosodic Structure
Prosodic Modeling I,
II
- Jian Yu, Lixing Huang, Jianhua Tao, Xia Wang:
Modeling incompletion phenomenon in Mandarin dialog prosody.
462-465
- Anne Tamm, Kálmán Abari, Gábor Olaszy:
Accent assignment algorithm in Hungarian, based on syntactic analysis.
466-469
- Cheng-Yuan Lin, Pei-Chi Jao, Jyh-Shing Roger Jang:
An effective initial/final duration prediction method for corpus-based singing voice synthesis of Mandarin Chinese.
470-473
- Géza Németh, Márk Fék, Tamás Gábor Csapó:
Increasing prosodic variability of text-to-speech synthesizers.
474-477
- Damien Lolive, Nelly Barbot, Olivier Boëffard:
Unsupervised HMM classification of F0 curves.
478-481
- Ian Read, Stephen Cox:
Automatic pitch accent prediction for text-to-speech synthesis.
482-485
- Xinqiang Ni, Yining Chen, Frank K. Soong, Min Chu, Ping Zhang:
An unsupervised approach to automatic prosodic annotation.
486-489
- Zeynep Inanoglu, Steve Young:
A system for transforming the emotion in speech: combining data-driven conversion techniques for prosody and voice quality.
490-493
- Chen-Yu Chiang, Hsiu-Min Yu, Yih-Ru Wang, Sin-Horng Chen:
An automatic prosody labeling method for Mandarin speech.
494-497
Speech Analysis
Spectral Analysis,
Formants and Vocal Tract Models
- Toon van Waterschoot, Marc Moonen:
Linear prediction of audio signals.
518-521
- Carlo Magi, Tomas Bäckström, Paavo Alku:
Stabilised weighted linear prediction - a robust all-pole method for speech processing.
522-525
- Daniel Rudoy, Daniel N. Spendley, Patrick J. Wolfe:
Conditionally linear Gaussian models for estimating vocal tract resonances.
526-529
- Karl Schnell, Arild Lacroix:
Time-varying pre-emphasis and inverse filtering of speech.
530-533
- Joachim Thiemann, Peter Kabal:
Reconstructing audio signals from modified non-coherent hilbert envelopes.
534-537
- Binh Phu Nguyen, Masato Akagi:
A flexible spectral modification method based on temporal decomposition and Gaussian mixture model.
538-541
- Jonathan Darch, Ben Milner:
A comparison of estimated and MAP-predicted formants and fundamental frequencies with a speech reconstruction application.
542-545
- Huiqun Deng, Douglas D. O'Shaughnessy:
Effect of incomplete glottal closures on estimates of glottal waves via inverse filtering of vowel sounds.
546-549
- Kaustubh Kalgaonkar, Mark A. Clements:
Vocal tract and area function estimation with both lip and glottal losses.
550-553
- S. Guruprasad, B. Yegnanarayana, K. Sri Rama Murty:
Detection of instants of glottal closure using characteristics of excitation source.
554-557
- Nicolas Sturmel, Christophe d'Alessandro, Boris Doval:
A comparative evaluation of the zeros of z transform representation for voice source estimation.
558-561
Speech and Audio Processing for Intelligent Environments
- Aki Härmä:
Ambient telephony: scenarios and research challenges.
562-565
- Yasunari Obuchi, Akio Amano:
Always listening to you: creating exhaustive audio database in home environments.
566-569
- Joerg Schmalenstroeer, Reinhold Haeb-Umbach:
Joint speaker segmentation, localization and identification for streaming audio.
570-573
- Yan-Chen Lu, Martin Cooke, Heidi Christensen:
Active binaural distance estimation for dynamic sources.
574-577
- Bengt J. Borgström, Abeer Alwan:
A packetization and variable bitrate interframe compression scheme for vector quantizer-based distributed speech recognition.
578-581
- Matthias Wölfel:
Channel selection by class separability measures for automatic transcriptions on distant microphones.
582-585
- Danny Wyatt, Tanzeem Choudhury, Jeff Bilmes:
Conversation detection and speaker segmentation in privacy-sensitive situated speech data.
586-589
- Alberto Abad, Carlos Segura, Climent Nadeu, Javier Hernando:
Audio-based approaches to head orientation estimation in a smart-room.
590-593
- Valentin Ion, Reinhold Haeb-Umbach:
Multi-resolution soft features for channel-robust distributed speech recognition.
594-597
Language Modeling I,
II
- Yi Su, Frederick Jelinek, Sanjeev Khudanpur:
Large-scale random forest language models for speech recognition.
598-601
- Yuya Akita, Yusuke Nemoto, Tatsuya Kawahara:
PLSA-based topic detection in meetings for adaptation of lexicon and language model.
602-605
- Atsushi Sako, Tetsuya Takiguchi, Yasuo Ariki:
Language modeling using PLSA-based topic HMM.
606-609
- Yi-Cheng Pan, Lin-Shan Lee:
Lexicon adaptation with reduced character error (LARCE) - a new direction in Chinese language modeling.
610-613
- Meng-Sung Wu, Jen-Tzung Chien:
Minimum rank error training for language modeling.
614-617
- Wen Wang, Andreas Stolcke:
Integrating MAP, marginals, and unsupervised language model adaptation.
618-621
Prosody Production and Perception
Multimodal Speech Recognition
- Noureddine Aboutabit, Denis Beautemps, Jeanne Clarke, Laurent Besacier:
A HMM recognition of consonant-vowel syllables from lip contours: the cued speech case.
646-649
- Patrick Lucey, Gerasimos Potamianos, Sridha Sridharan:
A unified approach to multi-pose audio-visual ASR.
650-653
- Rowan Seymour, Darryl Stewart, Ji Ming:
Audio-visual integration for robust speech recognition using maximum weighted stream posteriors.
654-657
- Thomas Hueber, Gérard Chollet, Bruce Denby, Gérard Dreyfus, Maureen Stone:
Continuous-speech phone recognition from ultrasound and optical images of the tongue and lips.
658-661
- Bo Zhu, Timothy J. Hazen, James R. Glass:
Multimodal speech recognition with ultrasonic sensors.
662-665
- David Dean, Patrick Lucey, Sridha Sridharan, Tim Wark:
Fused HMM-adaptation of multi-stream HMMs for audio-visual speech recognition.
666-669
Speech and Other Modalities
- Carlos Toshinori Ishi, Hiroshi Ishiguro, Norihiro Hagita:
Analysis of head motions and speech in spoken dialogue.
670-673
- Lars Bo Larsen, Kasper Løvborg Jensen, Søren Larsen, Morten H. Rasmussen:
A paradigm for mobile speech-centric services.
674-677
- Pavel Campr, Marek Hrúz, Milos Zelezný:
Design and recording of Czech sign language corpus for automatic sign language recognition.
678-681
- Jens Edlund, Jonas Beskow:
Pushy versus meek - using avatars to influence turn-taking behaviour.
682-685
- Michael Wand, Szu-Chen Stan Jou, Tanja Schultz:
Wavelet-based front-end for electromyographic speech recognition.
686-689
- Gaëlle Ferré, Roxane Bertrand, Philippe Blache, Robert Espesser, Stéphane Rauzy:
Intensive gestures in French and their multimodal correlates.
690-693
- Slim Ouni, Kaïs Ouni:
Aspects of visual speech in Arabic.
694-697
- Denis Burnham, Jessica Reynolds, Guillaume Vignali, Sandra Bollwerk, Caroline Jones:
Rigid vs non-rigid face and head motion in phone and tone perception.
698-701
Multimodal/Multimedia Signal Processing
- Hedvig Kjellström, Olov Engwall, Sherif Mahdy Abdou, Olle Bälter:
Audio-visual phoneme classification for pronunciation training applications.
702-705
- Katja Grauwinkel, Britta Dewitt, Sascha Fagel:
Visual information and redundancy conveyed by internal articulator dynamics in synthetic audiovisual speech.
706-709
- Wei Zhou, Zengfu Wang:
A speech rate related lip movement model for speech animation.
710-713
- Guanyong Wu, Jie Zhu:
An extension 2DPCA based visual feature extraction method for audio-visual speech recognition.
714-717
- Soo-jong Lee, Jun Park, Eung-kyeu Kim:
Preventing an external acoustic noise from being misrecognized as a speech recognition object by confirming the lip movement image signal.
718-721
- Gregor Hofer, Hiroshi Shimodaira:
Automatic head motion prediction from speech data.
722-725
- Yuki Denda, Takanobu Nishiura, Yoichi Yamashita:
Omnidirectional audio-visual talker localizer with dynamic feature fusion based on validity and reliability criteria.
726-729
- Nick Campbell, Damien Douxchamps:
Processing image and audio information for recognising discourse participation status through features of face and voice.
730-733
Speaker Verification & Identification I-IV
- José R. Calvo, Rafael Fernández, Gabriel Hernández:
Application of shifted delta cepstral features in speaker verification.
734-737
- Luciana Ferrer, M. Kemal Sönmez, Elizabeth Shriberg:
A smoothing kernel for spatially related features and its application to speaker verification.
738-741
- Delphine Charlet, Mikaël Collet, Frédéric Bimbot:
VZ-norm: an extension of z-norm to the multivariate case for anchor model based speaker verification.
742-745
- Howard Lei, Nikki Mirghafori:
Word-conditioned HMM supervectors for speaker recognition.
746-749
- Wei-Ho Tsai:
Speaker clustering using direct maximization of a BIC-based score.
750-753
- Alexandre Preti, Jean-François Bonastre, Driss Matrouf, François Capman, B. Ravera:
Confidence measure based unsupervised target model adaptation for speaker verification.
754-757
- Huanjun Bao, Ming-Xing Xu, Thomas Fang Zheng:
Emotion attribute projection for speaker recognition on emotional speech.
758-761
- Shi-Xiong Zhang, Man-Wai Mak, Helen M. Meng:
High-level feature-based speaker verification via articulatory phonetic-class pronunciation modeling.
762-765
- T. Yingthawornsuk, H. Kaymaz Keskinpala, D. M. Wilkes, R. G. Shiavi, R. M. Salomon:
Direct acoustic feature using iterative EM algorithm and spectral energy for classifying suicidal speech.
766-769
- Claudio Garretón, Néstor Becerra Yoma, Fernando Huenupán, Carlos Molina:
On comparing and combining intra-speaker variability compensation and unsupervised model adaptation in speaker verification.
770-773
- Xianyu Zhao, Yuan Dong, Hao Yang, Jian Zhao, Liang Lu, Haila Wang:
Comparison of two kinds of speaker location representation for SVM-based speaker verification.
774-777
- Mireia Farrús, Javier Hernando, Pascual Ejarque:
Jitter and shimmer measurements for speaker recognition.
778-781
- Zhenyu Shan, Yingchun Yang, Ruizhi Ye:
Natural-emotion GMM transformation algorithm for emotional speaker recognition.
782-785
- Ivy H. Tseng, Olivier Verscheure, Deepak S. Turaga, Upendra V. Chaudhari:
Optimized one-bit quantization for adapted GMM-based speaker verification.
786-789
- Mitchell McLaren, Robbie Vogt, Brendan Baker, Sridha Sridharan:
A comparison of session variability compensation techniques for SVM-based speaker recognition.
790-793
- Benoit G. B. Fauve, Nicholas W. D. Evans, Neil Pearson, Jean-François Bonastre, John S. D. Mason:
Influence of task duration in text-independent speaker verification.
794-797
Speech Enhancement
- Kamil K. Wójcicki, Stephen So, Kuldip K. Paliwal:
The effect of the additivity assumption on time and frequency domain wiener filtering for speech enhancement.
798-801
- Junfeng Li, Shuichi Sakamoto, Satoshi Hongo, Masato Akagi, Yôiti Suzuki:
Noise reduction based on adaptive β-order generalized spectral subtraction for speech enhancement.
802-805
- Amit Das, John H. L. Hansen:
Class constrained ROVER based speech enhancement.
806-809
- Erhan Deger, Md. Khademul Islam Molla, Keikichi Hirose, Nobuaki Minematsu, Md. Kamrul Hasan:
EMD based soft-thresholding for speech enhancement.
810-813
- Adam Borowicz, Alexander A. Petrovsky:
An approximate solution for perceptually constrained signal subspace speech enhancement method.
814-817
- Tim Fingscheidt, Suhadi Suhadi:
Quality assessment of speech enhancement systems by separation of enhanced speech, noise, and echo.
818-821
- Anis Ben Aicha, Sofia Ben Jebara:
Perceptual musical noise reduction using critical bands tonality coefficients and masking thresholds.
822-825
- Dirk Mauler, Anil M. Nagathil, Rainer Martin:
On optimal estimation of compressed speech for hearing aids.
826-829
- Richard C. Hendriks, Jesper Jensen, Richard Heusdens:
DFT domain subspace based noise tracking for speech enhancement.
830-833
- Nitish Krishnamurthy, John H. L. Hansen:
Noise tracking for speech systems in adverse environments.
834-837
- Abderrahman Essebbar, Tristan Poinsard:
Speech enhancement using multi-reference noise reduction in a vehicle environment.
838-841
- Ernst Warsitz, Reinhold Haeb-Umbach, Dang Hai Tran Vu:
Blind adaptive principal eigenvector beamforming for acoustical source separation.
842-845
- Zbynek Koldovský, Petr Tichavský:
Time-domain blind audio source separation using advanced ICA methods.
846-849
- Siu Wa Lee, Frank K. Soong, Pak-Chung Ching:
Model-based speech separation with single-microphone input.
850-853
- Keisuke Kinoshita, Marc Delcroix, Tomohiro Nakatani, Masato Miyoshi:
Multi-step linear prediction based speech dereverberation in noisy reverberant environment.
854-857
- Seung Yeol Lee, Jong Won Shin, Hwan Sik Yun, Nam Soo Kim:
A statistical model based post-filtering algorithm for residual echo suppression.
858-861
- Xiaoshan Huang, Xiaoqun Zhao:
An optimal speech enhancement under speech uncertainty probability and masking property of auditory system.
862-865
Structure-based and Template-based Automatic Speech Recognition
- Viktoria Maier, Roger K. Moore:
Temporal episodic memory model: an evolution of minerva2.
866-869
- Gianpaolo Coro, Francesco Cutugno, Fulvio Caropreso:
Speech recognition with factorial-HMM syllabic acoustic models.
870-873
- Mathias De Wachter, Kris Demuynck, Patrick Wambacq, Dirk Van Compernolle:
Evaluating acoustic distance measures for template based recognition.
874-877
- Yan Han, Lou Boves:
Hierarchical acoustic modeling based on random-effects regression for automatic speech recognition.
878-881
- Annika Hämäläinen, Louis ten Bosch, Lou Boves:
Construction and analysis of multiple paths in syllable models.
882-885
- Carol Y. Espy-Wilson, Tarun Pruthi, Amit Juneja, Om Deshmukh:
Landmark-based approach to speech recognition: an alternative to HMMs.
886-889
- Satoshi Asakawa, Nobuaki Minematsu, Keikichi Hirose:
Automatic recognition of connected vowels only using speaker-invariant representation of speech dynamics.
890-893
- Roberto Togneri, Li Deng:
A structured speech model parameterized by recursive dynamics and neural networks.
894-897
- Li Deng, Helmer Strik:
Structure-based and template-based automatic speech recognition - comparing parametric and non-parametric approaches.
898-901
- David Grangier, Samy Bengio:
Learning the inter-frame distance for discriminative template-based keyword detection.
902-905
- Dong Yu, Li Deng, Alex Acero:
Handling phonetic context and speaker variation in a structure-based speech recognizer.
906-909
Robust ASR Against Noise and Reverberation
- Maarten Van Segbroeck, Hugo Van hamme:
Vector-quantization based mask estimation for missing data automatic speech recognition.
910-913
- Sébastien Demange, Christophe Cerisara, Jean Paul Haton:
Accurate marginalization range for missing data recognition.
914-917
- Marco Kühne, Roberto Togneri, Sven Nordholm:
Smooth soft mel-spectrographic masks based on blind sparse source separation.
918-921
- Jonathan Laidler, Martin Cooke, Neil D. Lawrence:
Model-driven detection of clean speech patches in noise.
922-925
- Richard M. Stern, Evandro B. Gouvêa, Govindarajan Thattai:
"polyaural" array processing for automatic speech recognition in degraded environments.
926-929
- Nicolás Morales, Liang Gu, Yuqing Gao:
Adding noise to improve noise robustness in speech recognition.
930-933
Language Resources and Tools
- Eric Fosler-Lussier, Laura Dilley, Na'im Tyson, Mark Pitt:
The buckeye corpus of speech: updates and enhancements.
934-937
- Nora Barroso, Aitzol Ezeiza, N. Gilisagasti, Karmele López de Ipiña, A. López, J. M. López:
Development of multimodal resources for multilingual information retrieval in the basque context.
938-941
- Reva Schwartz, Wade Shen, Joseph P. Campbell, Shelley Paget, Julie Vonwiller, Dominique Estival, Christopher Cieri:
Construction of a phonotactic dialect corpus using semiautomatic annotation.
942-945
- Slim Abdennadher, Mohamed Aly, Dirk Bühler, Wolfgang Minker, Johannes Pittermann:
BECAM tool - a semi-automatic tool for bootstrapping emotion corpus annotation and management.
946-949
- Christopher Cieri, Linda Corson, David Graff, Kevin Walker:
Resources for new research directions in speaker recognition: the mixer 3, 4 and 5 corpora.
950-953
- Peter A. Heeman, Andy McMillin, J. Scott Yaruss:
Intercoder reliability in annotating complex disfluencies.
954-957
Single-channel Speech Enhancement
- Mohammad H. Radfar, Richard M. Dansereau:
Single channel speech separation using maximum a posteriori estimation.
958-961
- Suhadi Suhadi, Tim Fingscheidt:
Speech enhancement with improved a posteriori SNR computation.
962-965
- Thang Vu Tat, Germine Seide, Masashi Unoki, Masato Akagi:
Method of LP-based blind restoration for improving intelligibility of bone-conducted speech.
966-969
- Tiago H. Falk, Svante Stadler, W. Bastiaan Kleijn, Wai-Yip Chan:
Noise suppression based on extending a speech-dominated modulation band.
970-973
- Amin Haji Abolhassani, Sid-Ahmed Selouani, Douglas D. O'Shaughnessy, Mohamed-Faouzi Harkat:
Speech enhancement using PCA and variance of the reconstruction error model identification.
974-977
- Jong Won Shin, Woohyung Lim, June Sig Sung, Nam Soo Kim:
Speech reinforcement based on partial specific loudness.
978-981
Phonetics and Phonology
- Tamara Rathcke, Jonathan Harrington:
The phonetics and phonology of high and low tones in two falling f0-contours in standard German.
982-985
- Tina John, Jonathan Harrington:
Temporal alignment of creaky voice in neutralised realisations of an underlying, post-nasal voicing contrast in German.
986-989
- Mike Demol, Werner Verhelst, Piet Verhoeve:
The duration of speech pauses in a multilingual environment.
990-993
- Dafydd Gibbon, Jolanta Bachan, Grazyna Demenko:
Syllable timing patterns in Polish: results from annotation mining.
994-997
- Constandinos Kalimeris, Stelios Bakamidis:
Minimal pairs and functional loads of sound contrasts obtained from a list of modern greek words.
998-1001
- Daan Wissing:
More on acoustic correlates of stress.
1002-1005
- Cécile Woehrling, Philippe Boula de Mareüil:
Comparing praat and snack formant measurements on two large corpora of northern and southern French.
1006-1009
- William J. Barry, Bistra Andreeva, Ingmar Steiner:
The phonetic exponency of phrasal accentuation in French and German.
1010-1013
- Christiana Christodoulou:
Phonetic geminates in cypriot greek: the case of voiceless plosives.
1014-1017
- Darcie Williams, François Poiré:
Predicting vowel duration in spontaneous canadian French speech.
1018-1021
- Ivan Chow, François Poiré:
Rhotic variation and schwa epenthesis in windsor French.
1022-1025
- Audrey Bürki, Cécile Fougeron, Cédric Gendrot:
On the categorical nature of the process involved in schwa elision in French.
1026-1029
- Yue-Ning Hu, Min Chu, Chao Huang, Yan-Ning Zhang:
Exploring tonal variations via context-dependent tone models.
1030-1033
- Philippe Martin, Jun Li:
Acoustic analysis of the neutral tone in Mandarin.
1034-1037
- Rerrario Shui-Ching Ho, Yoshinori Sagisaka:
F0 analysis of perceptual distance among Cantonese level tones.
1038-1041
Robust ASR I,
II
- Yu Hu, Qiang Huo:
Irrelevant variability normalization based HMM training using VTS approximation of an explicit model of environmental distortions.
1042-1045
- Luis Buera, Antonio Miguel, Eduardo Lleida, Oscar Saz, Alfonso Ortega:
On the jointly unsupervised feature vector normalization and acoustic model compensation for robust speech recognition.
1046-1049
- Yu Tsao, Chin-Hui Lee:
An ensemble modeling approach to joint characterization of speaker and speaking environments.
1050-1053
- Shih-Hsiang Lin, Yao-Ming Yeh, Berlin Chen:
Cluster-based polynomial-fit histogram equalization (CPHEQ) for robust speech recognition.
1054-1057
- Pedro M. Martinez, José C. Segura, Luz García:
Robust distributed speech recognition using histogram equalization and correlation information.
1058-1061
- Jen-Tzung Chien, Koichi Shinoda, Sadaoki Furui:
Predictive minimum Bayes risk classification for robust speech recognition.
1062-1065
- Ning Ma, Jon Barker, Phil Green:
Applying word duration constraints by using unrolled HMMs.
1066-1069
- Xiong Xiao, Engsiong Chng, Haizhou Li:
Evaluating the temporal structure normalisation technique on the Aurora-4 task.
1070-1073
- Hynek Boril, Petr Fousek, Harald Höge:
Two-stage system for robust neutral/lombard speech recognition.
1074-1077
- Takatoshi Jitsuhiro, Tomoji Toriyama, Kiyoshi Kogure:
Noise suppression using search strategy with multi-model compositions.
1078-1081
- Takanobu Nishiura, Yoshiki Hirano, Yuki Denda, Masato Nakayama:
Investigations into early and late reflections on distant-talking speech recognition toward suitable reverberation criteria.
1082-1085
- Stefan Windmann, Reinhold Haeb-Umbach:
An approach to iterative speech feature enhancement and recognition.
1086-1089
- Jeih-Weih Hung:
Optimization of temporal filters in the modulation frequency domain for constructing robust features in speech recognition.
1090-1093
- Rico Petrick, Kevin Lohde, Matthias Wolff, Rüdiger Hoffmann:
The harming part of room acoustics in automatic speech recognition.
1094-1097
- Yuan-Fu Liao, Yh-Her Yang, Chi-Hui Hsu, Cheng-Chang Lee, Jing-Teng Zeng:
A reference model weighting-based method for robust speech recognition.
1098-1101
- Babak Nasersharif, Ahmad Akbari, Mohammad Mehdi Homayounpour:
Mel sub-band filtering and compression for robust speech recognition.
1102-1105
Features for ASR
- Chang-Wen Hsu, Lin-Shan Lee:
Extended powered cepstral normalization (p-CN) with range equalization for robust features in speech recognition.
1106-1109
- Makoto Sakai, Norihide Kitaoka, Seiichi Nakagawa:
Selection of optimal dimensionality reduction method using chernoff bound for segmental unit input HMM.
1110-1113
- Vivek Tyagi:
Fepstrum: an improved modulation spectrum for ASR.
1114-1117
- Dusan Macho:
Narrowband to wideband feature expansion for robust multilingual ASR.
1118-1121
- Weifeng Li, Hervé Bourlard:
Non-linear spectral contrast stretching for in-car speech recognition.
1122-1125
- Xiao-Bing Li, Douglas D. O'Shaughnessy:
Clustering-based two-dimensional linear discriminant analysis for speech recognition.
1126-1129
- Yotaro Kubo, Shigeki Okawa, Akira Kurematsu, Katsuhiko Shirai:
A study on temporal features derived by analytic signal.
1130-1133
- Stephen A. Zahorian, Tara Singh, Hongbing Hu:
Dimensionality reduction of speech features using nonlinear principal components analysis.
1134-1137
- D. Rama Sanand, D. Dinesh Kumar, Srinivasan Umesh:
Linear transformation approach to VTLN using dynamic frequency warping.
1138-1141
- Vladimir Fabregas Surigué de Alencar, Abraham Alcaim:
Features interpolation domain for distributed speech recognition and performance for ITU-t g.723.1 CODEC.
1142-1145
- Shoei Sato, Kazuo Onoe, Akio Kobayashi, Shinichi Homma, Toru Imai, Tohru Takagi, Tetsunori Kobayashi:
Dynamic integration of multiple feature streams for robust real-time LVCSR.
1146-1149
- Hironori Matsumasa, Tetsuya Takiguchi, Yasuo Ariki, Ichao Li, Toshitaka Nakabayashi:
PCA-based feature extraction for fluctuation in speaking style of articulation disorders.
1150-1153
- Fabio Valente, Jithendra Vepa, Hynek Hermansky:
Multi-stream features combination based on dempster-shafer rule for LVCSR system.
1154-1157
- Natasha Singh-Miller, Michael Collins, Timothy J. Hazen:
Dimensionality reduction for speech recognition using neighborhood components analysis.
1158-1161
- Dan Su, Xihong Wu, Huisheng Chi:
Probabilistic latent speaker analysis for large vocabulary speech recognition.
1162-1165
- S. R. Mahadeva Prasanna, Hynek Hermansky:
MRASTA and PLP in automatic speech recognition.
1166-1169
Objective Assessment of Voice and Speech Quality
- Markus Brckl:
Women's vocal aging: a longitudinal approach.
1170-1173
- Laurence Cnockaert, Jean Schoentgen, Canan Ozsancak, Pascal Auzou, Francis Grenez:
Effect of intensive voice therapy on vocal tremor for parkinson speakers.
1174-1177
- Ali Alpan, Abdellah Kacha, Francis Grenez, Jean Schoentgen:
Assessment of vocal dysperiodicities in connected disordered speech.
1178-1181
- Anne-Maria Laukkanen, Jaromír Horácek, Pavel Svancara, Elina Lehtinen:
Effects of FE modelled consequences of tonsillectomy on perceptual evaluation of voice.
1182-1185
- Irma Verdonck-de Leeuw, Louis ten Bosch, Li Ying Chao, Rico N. P. M. Rinkel, Pepijn A. Borggreven, Lou Boves, C. René Leemans:
Speech quality after major surgery of the oral cavity and oropharynx with microvascular soft tissue reconstruction.
1186-1189
- Christel G. de Bruijn, Sandra P. Whiteside:
Voice fatigue and use of speech recognition: a study of voice quality ratings.
1190-1193
- Jean-François Bonastre, Corinne Fredouille, Alain Ghio, Antoine Giovanni, Gilles Pouchoulin, Joana Revis, Bernard Teston, P. Yu:
Complementary approaches for voice disorder assessment.
1194-1197
- Gilles Pouchoulin, Corinne Fredouille, Jean-François Bonastre, Alain Ghio, Antoine Giovanni:
Frequency study for the characterization of the dysphonic voices.
1198-1201
- Victor J. Boucher:
Acoustic correlates of laryngeal-muscle fatigue: findings for a phonometric prevention of acquired voice pathologies.
1202-1205
- Andreas Maier, Maria Schuster, Anton Batliner, Elmar Nöth, Emeka Nkenke:
Automatic scoring of the intelligibility in patients with cancer of the oral cavity.
1206-1209
- Jacques Duchateau, Leen Cleuren, Hugo Van hamme, Pol Ghesquière:
Automatic assessment of children's reading level.
1210-1213
- Carlos A. Ferrer, María Esperanza Hernández-Díaz, Eduardo González:
Using waveform matching techniques in the measurement of shimmer in voiced signals.
1214-1217
- Rubén Fraile, Juan Ignacio Godino-Llorente, Nicolás Sáenz-Lechón, Víctor Osma-Ruiz, Pedro Gómez Vilda:
Analysis of the impact of analogue telephone channel on MFCC parameters for voice pathology detection.
1218-1221
- Claudia Manfredi, L. Bocchi, G. Cantarella, Giorgio Peretti, G. Guidi, V. Mezzatesta:
Objective parameters from videokymographic images: a user-friendly interface.
1222-1225
Speaker Verification & Identification I-IV
- Elizabeth Shriberg, Luciana Ferrer:
A text-constrained prosodic system for speaker verification.
1226-1229
- Asmaa El Hannani, Dijana Petrovska-Delacrétaz:
Fusing acoustic, phonetic and data-driven systems for text-independent speaker verification.
1230-1233
- Najim Dehak, Patrick Kenny, Pierre Dumouchel:
Continuous prosodic features and formant modeling with joint factor analysis for speaker verification.
1234-1237
- Claudio Vair, Daniele Colibro, Fabio Castaldo, Emanuele Dalmasso, Pietro Laface:
Loquendo - Politecnico di torino's 2006 NIST speaker recognition evaluation system.
1238-1241
- Driss Matrouf, Nicolas Scheffer, Benoit G. B. Fauve, Jean-François Bonastre:
A straightforward and efficient implementation of the factor analysis model for speaker verification.
1242-1245
- Timothy J. Hazen, Daniel Schultz:
Multi-modal user authentication from video for mobile or variable-environment applications.
1246-1249
Discourse,
Dialog and Emotion Expression
Prosodic Modeling I,
II
- Keikichi Hirose, Keiko Ochi, Nobuaki Minematsu:
Corpus-based generation of prosodic features from text based on generation process model.
1274-1277
- Jilei Tian, Jani Nurminen, Imre Kiss:
Novel eigenpitch-based prosody model for text-to-speech synthesis.
1278-1281
- Volker Strom, Ani Nenkova, Robert A. J. Clark, Yolanda Vazquez-Alvarez, Jason M. Brenier, Simon King, Dan Jurafsky:
Modelling prominence and emphasis improves unit-selection synthesis.
1282-1285
- Seiya Takada, Yuji Yagi, Keikichi Hirose, Nobuaki Minematsu:
A framework of reply speech generation for concept-to-speech conversion in spoken dialogue systems.
1286-1289
- Thorsten Stocksmeier, Stefan Kopp, Dafydd Gibbon:
Synthesis of prosodic attitudinal variants in German backchannel ja.
1290-1293
- Ke Li, Yoko Greenberg, Yoshinori Sagisaka:
Inter-language prosodic style modification experiment using word impression vector for communicative speech generation.
1294-1297
Resource Acquisition and Preparation; Resource and System Evaluation
- Ivan Habernal, Miloslav Konopík:
JAAE: the java abstract annotation editor.
1298-1301
- Goshu Nagino, Makoto Shozakai, Kiyohiro Shikano:
How to judge reusability of existing speech corpora for target task by utilizing statistical multidimensional scaling.
1302-1305
- Peter Rutten:
Feasibility of constructing an expressive speech corpus from television soap opera dialogue.
1306-1309
- Rosemary Orr, Bernat González i Llinares, Françoise Petersen, Helge Hüttenrauch, Martin Böcker, Michael Tate:
Collection of empirical data for standardization of generic vocabularies in speech driven ICT devices and services.
1310-1313
- Antonio Marcos Selmini, Fábio Violaro:
Acoustic-phonetic features for refining the explicit speech segmentation.
1314-1317
- Benjamin Lecouteux, Georges Linarès, Frédéric Beaugendre, Pascal Nocera:
Text island spotting in large speech databases.
1318-1321
- Tim Paek, Yun-Cheng Ju, Christopher Meek:
People watcher: a game for eliciting human-transcribed data for automated directory assistance.
1322-1325
- Andrew L. Kun, Tim Paek, Zeljko Medenica:
The effect of speech interface accuracy on driving performance.
1326-1329
- Hua Zhang, Lijuan Wang, Frank K. Soong, Wenju Liu:
Context constrained-generalized posterior probability for verifying phone transcriptions.
1330-1333
- Pongtep Angkititrakul, DongGu Kwak, SangJo Choi, JeongHee Kim, Anh PhucPhan, Amardeep Sathyanarayana, John H. L. Hansen:
Getting start with UTDrive: driver-behavior modeling and assessment of distraction for in-vehicle speech systems.
1334-1337
- BalaKrishna Kolluru, Yoshihiko Gotoh:
Relative evaluation of informativeness in machine generated summaries.
1338-1341
- Toshiyuki Takezawa, Masahide Mizushima, Tohru Shimizu, Gen-ichiro Kikui:
A method for evaluating task-oriented spoken dialog translation systems based on communication efficiency.
1342-1345
- Charlotte van Hooijdonk, Edwin Commandeur, Reinier Cozijn, Emiel Krahmer, Erwin Marsi:
Using eye movements for online evaluation of speech synthesis.
1346-1349
- Jian Li, Dmitry Sityaev, Jie Hao:
Sentence level intelligibility evaluation for Mandarin text-to-speech systems using semantically unpredictable sentences.
1350-1353
- Judith M. Kessens, David A. van Leeuwen:
N-best: the northern- and southern-dutch benchmark evaluation of speech recognition technology.
1354-1357
- Trym Holter, Svein Srsdal:
A MAP based approach to adaptive speech intelligibility measurements.
1358-1361
- Sirinoot Boonsuk, Proadpran Punyabukkana, Atiwong Suchato:
Phone boundary detection using selective refinements and context-dependent acoustic features.
1362-1365
Speech Production I,
II
- Sorin Dusan:
Vocal tract length during speech production.
1366-1369
- Nobuhiro Miki, Kyohei Hayashi:
Approximation method of subglottal system using ARMA filter.
1370-1373
- Asterios Toutios, Konstantinos G. Margaritis:
Enhancing acoustic-to-EPG mapping with lip position information.
1374-1377
- Tokihiko Kaburagi, Yosuke Tanabe:
A model of glottal flow incorporating viscous-inviscid interaction.
1378-1381
- Kilian G. Seeber:
Thinking outside the cube: modeling language processing tasks in a multiple resource paradigm.
1382-1385
- Julien Cisonni, Annemie Van Hirtum, Jan Willems, Xavier Pelorson:
Experimental validation of direct and inverse glottal flow models for unsteady flow conditions.
1386-1389
- Hideyuki Nomura, Tetsuo Funada:
Effect of unsteady glottal flow on the speech production process.
1390-1393
- Katrin Schneider, Bernd Möbius:
Word stress correlates in spontaneous child-directed speech in German.
1394-1397
- Michael Aron, Nicolas Ferveur, Erwan Kerrien, Marie-Odile Berger, Yves Laprie:
Acquisition and synchronization of multimodal articulatory data.
1398-1401
- Vincent Robert, Yves Laprie, Anne Bonneau:
A phonetic concatenative approach of labial coarticulation.
1402-1405
- Aseel Turkmani, Adrian Hilton, Philip J. B. Jackson, James D. Edge:
Visual analysis of lip coarticulation in VCV utterances.
1406-1409
- Matti Airas, Paavo Alku:
Comparison of multiple voice source parameters in different phonation types.
1410-1413
- Monja A. Knoll, Lisa Scharrer:
Acoustic and affective comparisons of natural and imaginary infant-, foreigner- and adult-directed speech.
1414-1417
- André Arajo, Luis M. T. Jesus, Isabel M. Costa:
Vowel production in two occlusal classes.
1418-1421
- Rajesh Khatiwada:
Nepalese retroflex stops: a static palatography study of inter- and intra-speaker variability.
1422-1425
- Charles A. Lamoureux, Victor J. Boucher:
Effects of testosterone levels on temporal and intonational aspects of speech: more exploratory data.
1426-1428
ASR:
New Paradigms
- Tien Ping Tan, Laurent Besacier:
Modeling context and language variation for non-native speech recognition.
1429-1432
- Xufang Zhao, Douglas D. O'Shaughnessy:
An evaluation of cross-language adaptation and native speech training for rapid HMM construction based on very limited training data.
1433-1436
- Konstantin Markov, Satoshi Nakamura:
Never-ending learning with dynamic hidden Markov network.
1437-1440
- Catherine Breslin, Mark J. F. Gales:
Building multiple complementary systems using directed decision trees.
1441-1444
- Hiroaki Nanjo, Yuichi Oku, Takehiko Yoshimi:
Automatic speech recognition framework for multilingual audio contents.
1445-1448
- Ghazi Bouselmi, Dominique Fohr, Irina Illina:
Combined acoustic and pronunciation modelling for non-native speech recognition.
1449-1452
- Tadashi Emori, Yoshifumi Onishi, Koichi Shinoda:
Automatic estimation of scaling factors among probabilistic models in speech recognition.
1453-1456
- Emilian Stoimenov, John W. McDonough:
Memory efficient modeling of polyphone context with weighted finite-state transducers.
1457-1460
- Valeriy Pylypenko:
Extra large vocabulary continuous speech recognition algorithm based on information retrieval.
1461-1464
- I. Lee Hetherington:
PocketSUMMIT: small-footprint continuous speech recognition.
1465-1468
- Tobias Cincarek, Izumi Shindo, Tomoki Toda, Hiroshi Saruwatari, Kiyohiro Shikano:
Development of preschool children subsystem for ASR and q&a in a real-environment speech-oriented guidance task.
1469-1472
- Chengyuan Ma, Chin-Hui Lee:
A study on word detector design and knowledge-based pruning and rescoring.
1473-1476
- Thomas Colthurst, Tresi Arvizo, Chia-Lin Kao, Owen Kimball, Stephen A. Lowe, David R. H. Miller, Jim Van Sciver:
Parameter tuning for fast speech recognition.
1477-1480
- Louis ten Bosch, Bert Cranen:
A computational model for unsupervised word discovery.
1481-1484
- Bernd T. Meyer, Matthias Wächter, Thomas Brand, Birger Kollmeier:
Phoneme confusions in human and automatic speech recognition.
1485-1488
- Kengo Ohta, Masatoshi Tsuchiya, Seiichi Nakagawa:
Construction of spoken language model including fillers using filler prediction model.
1489-1492
- Raghunandan Kumaran, Jeff Bilmes, Katrin Kirchhoff:
Attention shift decoding for conversational speech recognition.
1493-1496
Speech and Language Technology for Less-resourced Languages
- Péter Mihajlik, Tibor Fegyó, Zoltán Tüske, Pavel Ircing:
A morpho-graphemic approach for the recognition of spontaneous speech in agglutinative languages - like Hungarian.
1497-1500
- Mei Yang, Jing Zheng, Andreas Kathol:
A semi-supervised learning approach for morpheme segmentation for an Arabic dialect.
1501-1504
- Gerhard B. Van Huyssteen, Martin J. Puttkammer:
Accelerating the annotation of lexical data for less-resourced languages.
1505-1508
- Christoph Draxler:
On web-based creation of speech resources for less-resourced languages.
1509-1512
- Miroslav Martinovic, Srcrdan Vesic, Goran Rakic:
Building an information retrieval system for serbian - challenges and solutions.
1513-1516
- Guy De Pauw, Peter Waiganjo Wagacha:
Bootstrapping morphological analysis of gĩkũyũ using unsupervised maximum entropy learning.
1517-1520
- Jerneja Zganec-Gros, Stanislav Gruden:
The voiceTRAN machine translation system.
1521-1524
- Sérgio Paulo, Luís C. Oliveira:
MuLAS: a framework for automatically building multi-tier corpora.
1525-1528
- Jacquelijn Ringersma, Marc Kemps-Snijders:
Creating multimedia dictionaries of endangered languages using LEXUS.
1529-1532
- Hrafn Loftsson, Eiríkur Rögnvaldsson:
IceNLP: a natural language processing toolkit for icelandic.
1533-1536
- Marius Peche, Marelie H. Davel, Etienne Barnard:
Phonotactic spoken language identification with limited training data.
1537-1540
- Solomon Teferra Abate, Wolfgang Menzel:
Automatic speech recognition for an under-resourced language - amharic.
1541-1544
- Abdillahi Nimaan, Pascal Nocera, Frédéric Béchet, Jean-François Bonastre:
Information retrieval strategies for accessing african audio corpora.
1545-1548
- Vesa Siivola, Mathias Creutz, Mikko Kurimo:
Morfessor and variKN machine learning tools for speech and language technology.
1549-1552
- Markpong Jongtaveesataporn, Issara Thienlikit, Chai Wutiwiwatchai, Sadaoki Furui:
Towards better language modeling for Thai LVCSR.
1553-1556
Adaptation in ASR I,
II
Speech Perception I,
II
- Douglas Brungart, Nandini Iyer:
Time-compressed speech perception with speech and noise maskers.
1581-1584
- Anne Cutler, Martin Cooke, Maria Luisa Garcia Lecumberri, Dennis Pasveer:
L2 consonant identification in noise: cross-language comparisons.
1585-1588
- Jennifer T. Le, Catherine T. Best, Michael D. Tyler, Christian Kroos:
Effects of non-native dialects on spoken word recognition.
1589-1592
- Julien Meyer, Fanny Meunier, Laure Dentel:
Identification of natural whistled vowels by non-whistlers.
1593-1596
- Alexandra Jesse, James M. McQueen:
Prelexical adjustments to speaker idiosyncrasies: are they position-specific?
1597-1600
- Holger Mitterer:
Top-down effects on compensation for coarticulation are not replicable.
1601-1604
Spoken Language Understanding
- Christian Raymond, Giuseppe Riccardi:
Generative and discriminative algorithms for spoken language understanding.
1605-1608
- Elias Iosif, Alexandros Potamianos:
A soft-clustering algorithm for automatic induction of semantic classes.
1609-1612
- Agustín Gravano, Stefan Benus, Julia Hirschberg, Shira Mitchell, Ilia Vovsha:
Classification of discourse functions of affirmative words in spoken dialogue.
1613-1616
- Bogdan Minescu, Géraldine Damnati, Frédéric Béchet, Renato de Mori:
Conditional use of word lattices, confusion networks and 1-best string hypotheses in a sequential interpretation strategy.
1617-1620
- Jáchym Kolár, Yang Liu, Elizabeth Shriberg:
Speaker adaptation of language models for automatic dialog act segmentation of meetings.
1621-1624
- Amparo Albalate, Dimitar Dimitrov, Roberto Pieraccini:
Unsupervised categorisation approaches for technical support automated agents.
1625-1628
Pitch Extraction I,
II
- Michael Wohlmayr, Marián Képesi:
Joint position-pitch extraction from multichannel audio.
1629-1632
- Hyun Soo Kim:
Morphological pre-processing technique and its applications on speech signal.
1633-1636
- Patricia A. Pelle, Claudio Estienne:
A pitch extraction system based on phase locked loops and consensus decision.
1637-1640
- Milan Legát, Jindrich Matousek, Daniel Tihelka:
A robust multi-phase pitch-mark detection algorithm.
1641-1644
- Md. Khademul Islam Molla, Keikichi Hirose, Nobuaki Minematsu, Md. Kamrul Hasan:
Pitch estimation of noisy speech signals using empirical mode decomposition.
1645-1648
- Daniel Hirst, Hyongsil Cho, Sunhee Kim, Hyunji Yu:
Evaluating two versions of the momel pitch modelling algorithm on a corpus of read speech in Korean.
1649-1652
- Hussein Hussein, Oliver Jokisch:
Hybrid electroglottograph and speech signal based algorithm for pitch marking.
1653-1656
Speech Coding and Transmission
- Saikat Chatterjee, Thippur V. Sreenivas:
Normalized two stage SVQ for minimum complexity wide-band LSF quantization.
1657-1660
- Peng Zhang, Changchun Bao:
A novel 2kb/s waveform interpolation speech coder based on non-negative matrix factorization.
1661-1664
- Ahmed Ismail, Yasser Dakroury, Hazem Abbas:
A novel energy distribution comparison approach for robust speech spectrum vector quantization.
1665-1668
- Ahmed Ismail, Yasser Dakroury, Hazem Abbas:
Novel low-band phase representation for low bit-rate speech coding.
1669-1672
- Chun-Feng Wu, Cheng-Lung Lee, Wen-Whei Chang:
Perceptual-based playout mechanisms for multi-stream voice over IP networks.
1673-1676
- Robert Zopf, Jes Thyssen, Juin-Hwey Chen:
Time-warping and re-phasing in packet loss concealment.
1677-1680
- Yannis Agiomyrgiannakis, Yannis Stylianou:
The harmonic model codec (HMC) framework for voIP.
1681-1684
- Yannis Agiomyrgiannakis, Yannis Stylianou:
Bit-erasure channel decoding for GMM-based multiple description coding.
1685-1688
- Hua Yuan, Tiago H. Falk, Wai-Yip Chan:
Degradation-classification assisted single-ended quality measurement of speech.
1689-1692
- Alexander Raake, Sascha Spors, Jens Ahrens, Jitendra Ajmera:
Concept and evaluation of a downward-compatible system for spatial teleconferencing using automatic speaker clustering.
1693-1696
- Min-Ki Lee, Kyung-Tae Kim, Hong-Goo Kang, Dae Hee Youn:
Speech quality estimation using packet loss effects in CELP-type speech coders.
1697-1700
- Masahiro Oshikiri, Hiroyuki Ehara, Toshiyuki Morii, Tomofumi Yamanashi, Kaoru Satoh, Koji Yoshida:
An 8-32 kbit/s scalable wideband coder extended with MDCT-based bandwidth extension on top of a 6.8 kbit/s narrowband CELP coder.
1701-1704
Topics in Acoustic Modeling
- Robert Wielgat, Tomasz P. Zielinski, Pawel Swietojanski, Piotr Zoladz, Daniel Król, Tomasz Wozniak, Stanislaw Grabias:
Comparison of HMM and DTW methods in automatic recognition of pathological phoneme pronunciation.
1705-1708
- Kai Yu, Mark J. F. Gales, Philip C. Woodland:
Unsupervised training with directed manual transcription for recognising Mandarin broadcast audio.
1709-1712
- Hao Wu, Xihong Wu:
Context dependent syllable acoustic model for continuous Chinese speech recognition.
1713-1716
- Dimitris Oikonomidis, Vassilios Diakoloukas, Vassilios Digalakis:
A sub-optimal viterbi-like search for linear dynamic models classification.
1717-1720
- Georg Heigold, Ralf Schlüter, Hermann Ney:
On the equivalence of Gaussian HMM and Gaussian HMM-like hidden conditional random fields.
1721-1724
- Stefano Scanzio, Pietro Laface, Roberto Gemello, Franco Mana:
Speeding-up neural network training using sentence and frame selection.
1725-1728
- Linquan Liu, Thomas Fang Zheng, Makoto Akabane, Ruxin Chen, Wenhu Wu:
Using a small development set to build a robust dialectal Chinese speech recognizer.
1729-1732
Confidence Measures (and Related Topics)
- Carlos Molina, Néstor Becerra Yoma, Fernando Huenupán, Claudio Garretón:
Unsupervised re-scoring of observation probability in viterbi based on reinforcement learning by using confidence measure and HMM neighborhood.
1733-1736
- Shiuan-Sung Lin, François Yvon:
Optimization on decoding graphs by discriminative training.
1737-1740
- Stéphane Huet, Guillaume Gravier, Pascale Sébillot:
Morphosyntactic processing of n-best lists for improved recognition and confidence measure computation.
1741-1744
- Xiang Li, Juan M. Huerta:
How predictable is ASR confidence in dialog applications?
1745-1748
- Alexandre Allauzen:
Error detection in confusion network.
1749-1752
- Takanobu Oba, Takaaki Hori, Atsushi Nakamura:
An approach to efficient generation of high-accuracy and compact error-corrective models for speech recognition.
1753-1756
- Hamed Ketabdar, Mirko Hannemann, Hynek Hermansky:
Detection of out-of-vocabulary words in posterior based ASR.
1757-1760
Grapheme-to-Phoneme Conversion
- Daniela Braga, Luís Pinto Coelho, Fernando Gil Vianna Resende Jr.:
Homograph ambiguity resolution in front-end design for portuguese TTS systems.
1761-1764
- Ghinwa F. Choueiter, Stephanie Seneff, James R. Glass:
New word acquisition using subword modeling.
1765-1768
- Samuel Thomas, Ashish Verma:
Language identification of person names using CF-IOF based weighing function.
1769-1772
- Henk van den Heuvel, Jean-Pierre Martens, Nanneke Konings:
G2p conversion of names: what can we do (better)?
1773-1776
- Ausdang Thangthai, Chai Wutiwiwatchai, Anocha Rugchatjaroen, Sittipong Saychum:
A learning method for Thai phonetization of English words.
1777-1780
- Steffen Werner, Rüdiger Hoffmann:
Spontaneous speech synthesis by pronunciation variant selection - a comparison to natural speech.
1781-1784
- Nikos Tsourakis, Vassilios Digalakis:
A generic methodology of converting transliterated text to phonetic strings case study: greeklish.
1785-1788
- Rita Singh, Evandro B. Gouvêa, Bhiksha Raj:
Probabilistic deduction of symbol mappings for extension of lexicons.
1789-1792
Lexical and Prosodic Modeling
- Sergey Astrov, Joachim Hofer, Harald Höge:
Use of syllable center detection for improved duration modeling in Chinese Mandarin connected digits recognition.
1793-1796
- Thomas Pellegrini, Lori Lamel:
Using phonetic features in unsupervised word decompounding for ASR with application to a less-represented language.
1797-1800
- Sheng Qiang, Yao Qian, Frank K. Soong, Congfu Xu:
Robust F0 modeling for Mandarin speech recognition in noise.
1801-1804
- Dino Seppi, Daniele Falavigna, Georg Stemmer, Roberto Gretter:
Word duration modeling for word graph rescoring in LVCSR.
1805-1808
- Fabio Tamburini, Petra Wagner:
On automatic prominence detection for German.
1809-1812
- Sankaranarayanan Ananthakrishnan, Shrikanth S. Narayanan:
Prosody-enriched lattices for improved syllable recognition.
1813-1816
- Joel Pinto, Andrew Lovitt, Hynek Hermansky:
Exploiting phoneme similarities in hybrid HMM-ANN keyword spotting.
1817-1820
- C. E. Liu, Kishan Thambiratnam, Frank Seide:
Online vocabulary adaptation using limited adaptation data.
1821-1824
Speech Recognition by Automatic Attribute Transcription
- Chin-Hui Lee, Mark A. Clements, Sorin Dusan, Eric Fosler-Lussier, Keith Johnson, Biing-Hwang Juang, Lawrence R. Rabiner:
An overview on automatic speech attribute transcription (ASAT).
1825-1828
- Ilana Bromberg, Qian Qian, Jun Hou, Jinyu Li, Chengyuan Ma, Brett Matthews, Antonio Moreno-Daniel, Jeremy Morris, Sabato Marco Siniscalchi, Yu Tsao, Yu Wang:
Detection-based ASR in the automatic speech attribute transcription project.
1829-1832
- Chi-Yueh Lin, Hsiao-Chuan Wang:
Attribute-based Mandarin speech recognition using conditional random fields.
1833-1836
- Helmer Strik, Khiet P. Truong, Febe de Wet, Catia Cucchiarini:
Comparing classifiers for pronunciation error detection.
1837-1840
- Jarek Krajewski, Bernd J. Kröger:
Using prosodic and spectral characteristics for sleepiness detection.
1841-1844
- Brian M. Ore, Raymond E. Slyh:
Score fusion for articulatory feature detection.
1845-1848
Speaker Diarization
First and Second Language Learning
- Wai-Sum Lee:
Tone production by the speakers of different age-and-gender groups.
1873-1876
- Nan Xu, Denis Burnham, Christine Kitamura:
Vowels and tones in infant directed speech: hyperarticulation for both, but different developmental patterns.
1877-1880
- Eon-Suk Ko:
Acquisition of vowel duration in children speaking american English.
1881-1884
- Hiroko Hirano, Keikichi Hirose, Goh Kawai, Wentao Gu, Nobuaki Minematsu:
F0 models show Chinese speakers of Japanese insert intonational boundaries and drop pitch.
1885-1888
- Paola Escudero, Jelle Kastelein, Klara A. Weiand, R. J. J. H. van Son:
Formal modelling of L1 and L2 perceptual learning: computational linguistics versus machine learning.
1889-1892
- Mirjam Broersma:
Kettle hinders cat, shadow does not hinder shed: activation of 'almost embedded' words in nonnative listening.
1893-1896
Speech Synthesis I,
II
- Sacha Krstulovic, Anna Hunecke, Marc Schröder:
An HMM-based speech synthesis system applied to German and its adaptation to a limited set of expressive football announcements.
1897-1900
- Liang Gu, Wei Zhang, Lazkin Tahir, Yuqing Gao:
Statistical vowelization of Arabic text for speech synthesis in speech-to-speech translation systems.
1901-1904
- Wu Liu, Dezhi Huang, Yuan Dong, Xinnian Mao, Haila Wang:
A pair-based language model for the robust lexical analysis in Chinese text-to-speech synthesis.
1905-1908
- Ranniery Maia, Tomoki Toda, Heiga Zen, Yoshihiko Nankaku, Keiichi Tokuda:
A trainable excitation model for HMM-based speech synthesis.
1909-1912
- Jochen Steigner, Marc Schröder:
Cross-language phonemisation in German text-to-speech synthesis.
1913-1916
- Ryuki Tachibana, Tohru Nagano, Gakuto Kurata, Masafumi Nishimura, Noboru Babaguchi:
Preliminary experiments toward automatic generation of new TTS voices from recorded speech alone.
1917-1920
Phonetic Segmentation and Classification I,
II
- Xiaochuan Niu, Jan P. H. van Santen:
Dual-channel acoustic detection of nasalization states.
1921-1924
- Tarun Pruthi, Carol Y. Espy-Wilson:
Acoustic parameters for the automatic detection of vowel nasalization.
1925-1928
- Jun Hou, Lawrence R. Rabiner, Sorin Dusan:
On the use of time-delay neural networks for highly accurate classification of stop consonants.
1929-1932
- Ladan Golipour, Douglas D. O'Shaughnessy:
A new approach for phoneme segmentation of speech signals.
1933-1936
- Veronique Stouten, Kris Demuynck, Hugo Van hamme:
Automatically learning the units of speech by non-negative matrix factorisation.
1937-1940
- Ozlem Kalinli, Shrikanth S. Narayanan:
A saliency-based auditory attention model with applications to unsupervised prominent syllable detection in speech.
1941-1944
- Sung Jun An, Young-Ik Kim, Rhee Man Kil:
Zero-crossing-based ratio masking for sound segregation.
1945-1948
- Satomi Tanaka, Minoru Tsuzaki, Hiroaki Kato, Yoshinori Sagisaka:
Event detection of speech signals based on auditory processing with a dynamic compressive gammachirp filterbank.
1949-1952
- Odette Scharenborg, Mirjam Ernestus, Vincent Wan:
Segmentation of speech: child's play?
1953-1956
- Andrew Errity, John McKenna, Barry Kirkpatrick:
Dimensionality reduction methods applied to both magnitude and phase derived features.
1957-1960
Voice Conversion and Modification
- Zdenek Hanzlícek, Jindrich Matousek:
F0 transformation within the voice conversion framework.
1961-1964
- Daniel Erro, Asunción Moreno:
Weighted frequency warping for voice conversion.
1965-1968
- Daniel Erro, Asunción Moreno:
Frame alignment method for cross-lingual voice conversion.
1969-1972
- Jani Nurminen, Jilei Tian, Victor Popa:
Voicing level control with application in voice conversion.
1973-1976
- Winston S. Percybrooks, Elliot Moore:
New algorithm for LPC residual estimation from LSF vectors for a voice conversion system.
1977-1980
- Yamato Ohtani, Tomoki Toda, Hiroshi Saruwatari, Kiyohiro Shikano:
Speaker adaptive training for one-to-many eigenvoice conversion based on Gaussian mixture model.
1981-1984
- Petko N. Petkov, W. Bastiaan Kleijn:
Improving the phase vocoder approach to pitch-shifting.
1985-1988
- Larbi Mesbahi, Vincent Barreaud, Olivier Boëffard:
Comparing GMM-based speech transformation systems.
1989-1992
Speaker Verification & Identification I-IV
- Michael Gerber, René Beutler, Beat Pfister:
Quasi text-independent speaker-verification based on pattern matching.
1993-1996
- Yosef A. Solewicz, Moshe Koppel:
Virtual fusion for speaker recognition.
1997-2000
- Yi-Hsiang Chao, Wei-Ho Tsai, Shih-Sian Cheng, Hsin-Min Wang, Ruei-Chuan Chang:
Evolutionary minimum verification error learning of the alternative hypothesis model for LLR-based speaker verification.
2001-2004
- Seiichi Nakagawa, Kouhei Asakawa, Longbiao Wang:
Speaker recognition by combining MFCC and phase information.
2005-2008
- Sandeep Manocha, Carol Y. Espy-Wilson:
A semi-automatic approach for speaker mining of tapped telephone conversations.
2009-2012
- Hao Yang, Yuan Dong, Xianyu Zhao, Jian Zhao, Liang Lu, Haila Wang:
Cluster adaptive training weights as features in SVM-based speaker verification.
2013-2016
- Hideki Okamoto, Mariko Kojima, Tomoko Matsui, Hiromichi Kawanami, Hiroshi Saruwatari, Kiyohiro Shikano:
Study on speaker verification with non-audible murmur segments.
2017-2020
- Xugang Lu, Jianwu Dang:
Dimension reduction for speaker identification based on mutual information.
2021-2024
- Jonas Lindh, Anders Eriksson:
Robustness of long time measures of fundamental frequency.
2025-2028
- Vinod Prakash, John H. L. Hansen:
Score distribution scaling for speaker recognition.
2029-2032
- Andrew C. Morris, Jacques C. Koreman, B. Ly-Van, Harin Sellahewa, Sabah Jassim, R. Llarena Gómez:
Global features for rapid identity verification with dynamic biometric data.
2033-2036
- Tuan Van Pham, Michael Neffe, Gernot Kubin:
Robust voice activity detection for narrow-bandwidth speaker verification under adverse environments.
2037-2040
- Fernando Huenupán, Néstor Becerra Yoma, Carlos Molina, Claudio Garretón:
Speaker verification with multiple classifier fusion using Bayes based confidence measure.
2041-2044
- Girija Chetty, Michael Wagner:
Audiovisual speaker identity verification based on lip motion features.
2045-2048
- Gökhan Tür, Elizabeth Shriberg, Andreas Stolcke, Sachin S. Kajarekar:
Duration and pronunciation conditioned lexical modeling for speaker verification.
2049-2052
- Jean-François Bonastre, Driss Matrouf, Corinne Fredouille:
Artificial impostor voice transformation effects on false acceptance rates.
2053-2056
Improved Acoustic Modeling for ASR
- Jen-Wei Kuo, Hung-Yi Lo, Hsin-Min Wang:
Improved HMM/SVM methods for automatic phoneme segmentation.
2057-2060
- Takahiro Shinozaki, Tatsuya Kawahara:
Gaussian mixture optimization for HMM based on efficient cross-validation.
2061-2064
- Heiga Zen, Yoshihiko Nankaku, Keiichi Tokuda:
Model-space MLLR for trajectory HMMs.
2065-2068
- Hamed Ketabdar, Hervé Bourlard:
In-context phone posteriors as complementary features for tandem ASR.
2069-2072
- Qian Qian, Xiaodong He, Li Deng:
Phone-discriminating minimum classification error (p-MCE) training for phonetic recognition.
2073-2076
- Lori Lamel, Abdelkhalek Messaoudi, Jean-Luc Gauvain:
Improved acoustic modeling for transcribing Arabic broadcast data.
2077-2080
- Erik McDermott, Atsushi Nakamura:
String and lattice based discriminative training for the corpus of spontaneous Japanese lecture transcription task.
2081-2084
- Byung-Ok Kang, Ho-Young Jung, Yun-Keun Lee:
Discriminative noise adaptive training approach for an environment migration.
2085-2088
- Jia-Yu Chen, Peder A. Olsen, John R. Hershey:
Word confusability - measuring hidden Markov model similarity.
2089-2092
- Thomas Deselaers, Georg Heigold, Hermann Ney:
Speech recognition with state-based nearest neighbour classifiers.
2093-2096
- Remco Teunen, Masami Akamine:
HMM-based speech recognition using decision trees instead of GMMs.
2097-2100
- Christian Gollan, Stefan Hahn, Ralf Schlüter, Hermann Ney:
An improved method for unsupervised training of LVCSR systems.
2101-2104
- Mohamed Kamal Omar:
A variational approach to robust maximum likelihood estimation for speech recognition.
2105-2108
- Kai Yu, Rob A. Rutenbar:
Generating small, accurate acoustic models with a modified Bayesian information criterion.
2109-2112
- Peter Bell, Simon King:
Sparse Gaussian graphical models for speech recognition.
2113-2116
- Sakriani Sakti, Konstantin Markov, Satoshi Nakamura:
An HMM acoustic model incorporating various additional knowledge sources.
2117-2120
- Matti Varjokallio, Mikko Kurimo:
Comparison of subspace methods for Gaussian mixture models in speech recognition.
2121-2124
Multilingualism in Speech and Language Processing
- Tanja Schultz, Alan W. Black, Sameer Badaskar, Matthew Hornyak, John Kominek:
SPICE: web-based tools for rapid language adaptation in speech processing systems.
2125-2128
- Filip Deprez, Jan Odijk, Jan De Moortel:
Introduction to multilingual corpus-based concatenative speech synthesis.
2129-2132
- Frederik Stouten, Jean-Pierre Martens:
Recognition of foreign names spoken by native speakers.
2133-2136
- Ricardo de Córdoba, Luis Fernando D'Haro, Fernando F. Fernández-Martínez, Juan Manuel Montero, Roberto Barra-Chicote:
Language identification using several sources of information with a multiple-Gaussian classifier.
2137-2140
- Carmen del Solar, Guillermo Pérez, Eva Florencio, David Moral, Gabriel Amores Carredano, Pilar Manchón Portillo:
Dynamic language change in MIMUS.
2141-2144
Systems for LVCSR and Rich Transcription I,
II
- Jonas Lööf, Christian Gollan, Stefan Hahn, Georg Heigold, Björn Hoffmeister, Christian Plahl, David Rybach, Ralf Schlüter, Hermann Ney:
The RWTH 2007 TC-STAR evaluation system for european English and Spanish.
2145-2148
- Chin-Wei Eugene Koh, Hanwu Sun, Tin Lay Nwe, Trung Hieu Nguyen, Bin Ma, Engsiong Chng, Haizhou Li, Susanto Rahardja:
Using direction of arrival estimate and acoustic feature information in speaker diarization.
2149-2152
- Fernando Batista, Diamantino Caseiro, Nuno J. Mamede, Isabel Trancoso:
Recovering punctuation marks for automatic speech recognition.
2153-2156
- Jui-Feng Yeh, Chung-Hsien Wu, Wei-Yen Wu:
Disfluency correction of spontaneous speech using conditional random fields with variable-length features.
2157-2160
- Jing Huang, Etienne Marcheret, Karthik Visweswariah, Vit Libal, Gerasimos Potamianos:
Detection, diarization, and transcription of far-field lecture speech.
2161-2164
- Timothy J. Hazen, Brennan Sherry, Mark Adler:
Speech-based annotation and retrieval of digital photographs.
2165-2168
Language Learning and Assessment
- Joseph Tepperman, Abe Kazemzadeh, Shrikanth S. Narayanan:
A text-free approach to assessing nonnative intonation.
2169-2172
- John Lee, Stephanie Seneff:
Automatic generation of cloze items for prepositions.
2173-2176
- Christopher J. Waple, Hongcui Wang, Tatsuya Kawahara, Yasushi Tsubota, Masatake Dantsuji:
Evaluating and optimizing Japanese tutor system featuring dynamic question generation and interactive guidance.
2177-2180
- Catia Cucchiarini, Ambra Neri, Febe de Wet, Helmer Strik:
ASR-based pronunciation training: scoring accuracy and pedagogical effectiveness of a system for dutch L2 learners.
2181-2184
- Joseph Tepperman, Matthew Black, Patti Price, Sungbok Lee, Abe Kazemzadeh, Matteo Gerosa, Margaret Heritage, Abeer Alwan, Shrikanth S. Narayanan:
A Bayesian network classifier for word-level reading assessment.
2185-2188
Multimodal Interaction:
Analysis and Technology
- Hartwig Holzapfel, Alex Waibel:
Behavior models for learning and receptionist dialogs.
2189-2192
- Markku Turunen, Jaakko Hakulinen, Anssi Kainulainen, Aleksi Melto, Topi Hurtig:
Design of a rich multimodal interface for mobile spoken route guidance.
2193-2196
- Mariët Theune, Dennis Hofs, Marco van Kessel:
The virtual guide: a direction giving embodied conversational agent.
2197-2200
- Sudeep Gandhe, David R. Traum:
Creating spoken dialogue characters from corpora without annotations.
2201-2204
- Pui-Yu Hui, Zhengyu Zhou, Helen M. Meng:
Complementarity and redundancy in multimodal user inputs with speech and pen gestures.
2205-2208
- Linda Bell, Joakim Gustafson:
Children's convergence in referring expressions to graphical objects in a speech-enabled computer game.
2209-2212
Emotion
- Hiromi Kawatsu, Sumio Ohno:
An analysis of individual differences in the f0 contour and the duration of anger utterances at several degrees.
2213-2216
- Yoshiko Arimoto, Sumio Ohno, Hitoshi Iida:
Acoustic features of anger utterances during natural dialog.
2217-2220
- Fadi Biadsy, Julia Hirschberg, Andrew Rosenberg, Wisam Dakka:
Comparing american and palestinian perceptions of charisma using acoustic-prosodic and lexical analysis.
2221-2224
- Carlos Busso, Sungbok Lee, Shrikanth S. Narayanan:
Using neutral speech models for emotional speech analysis.
2225-2228
- N. Satoh, Katsuya Yamauchi, Shoichi Matsunaga, Masaru Yamashita, R. Nakagawa, Kazuyuki Shinohara:
Emotion clustering using the results of subjective opinion tests for emotion recognition in infants' cries.
2229-2232
- Roberto Barra-Chicote, Juan Manuel Montero, Javier Macías Guarasa, Juana M. Gutiérrez-Arriola, Javier Ferreiros, Juan Manuel Pardo:
On the limitations of voice conversion techniques in emotion identification tasks.
2233-2236
- Kate Dupuis, Kathleen Pichora-Fuller:
Use of lexical and affective prosodic cues to emotion by younger and older adults.
2237-2240
- Purnima Gupta, Nitendra Rajput:
Two-stream emotion recognition for call center monitoring.
2241-2244
- Ioulia Grichkovtsova, Anne Lacheret, Michel Morel:
The role of intonation and voice quality in the affective speech perception.
2245-2248
- Bogdan Vlasenko, Björn Schuller, Andreas Wendemuth, Gerhard Rigoll:
Combining frame and turn-level information for robust recognition of emotions within speech.
2249-2252
Speakers:
Expression,
Emotion and Personality Recognition
- Björn Schuller, Anton Batliner, Dino Seppi, Stefan Steidl, Thurid Vogt, Johannes Wagner, Laurence Devillers, Laurence Vidrascu, Noam Amir, Loïc Kessous, Vered Aharonson:
The relevance of feature type for the automatic classification of emotional user states: low level descriptors and functionals.
2253-2256
- Minh-Quang Vu, Laurent Besacier, Eric Castelli:
Automatic question detection: prosodic-lexical features and crosslingual experiments.
2257-2260
- Makoto Tachibana, Keigo Kawashima, Junichi Yamagishi, Takao Kobayashi:
Performance evaluation of HMM-based style classification with a small amount of training data.
2261-2264
- Khiet P. Truong, David A. van Leeuwen:
Visualizing acoustic similarities between emotions in speech: an acoustic map of emotions.
2265-2268
- Hao Hu, Ming-Xing Xu, Wei Wu:
Fusion of global statistical and segmental spectral features for speech emotion recognition.
2269-2272
- Vidhyasaharan Sethu, Eliathamby Ambikairajah, Julien Epps:
Group delay features for emotion detection.
2273-2276
- Christian Mller, Felix Burkhardt:
Combining short-term cepstral and long-term pitch features for automatic recognition of speaker age.
2277-2280
- Frank Enos, Elizabeth Shriberg, Martin Graciarena, Julia Hirschberg, Andreas Stolcke:
Detecting deception using critical segments.
2281-2284
- Takashi Nose, Yoichi Kato, Takao Kobayashi:
Style estimation of speech based on multiple regression hidden semi-Markov model.
2285-2288
- Chi Zhang, John H. L. Hansen:
Analysis and classification of speech mode: whispered through shouted.
2289-2292
First Language,
Second Language,
Cross-language
- Melissa Bettoni-Techio, Andréia S. Rauber, Rosana Denise Koerich:
Perception and production of word-final alveolar stops by brazilian portuguese learners of English.
2293-2296
- Denise Cristina Kluge, Andréia S. Rauber, Mara Silvia Reis, Ricardo Augusto Hoffmann Bion:
The relationship between the perception and production of English nasal codas by brazilian learners of English.
2297-2300
- Takafumi Utashiro, Goh Kawai:
CALL courseware for learning reactive tokens in face-to-face dialogs.
2301-2304
- Shinya Kiriyama, Ryo Tsuji, Tomohiko Kasami, Shogo Ishikawa, Naofumi Otani, Hiroaki Horiuchi, Yoichi Takebayashi, Shigeyoshi Kitazawa:
The developmental analysis of demonstrative expression skills utilizing a multimodal infant behavior corpus.
2305-2308
- Elena E. Lyakso, Olga V. Frolova:
Russian vowels system acoustic features development in ontogenesis.
2309-2312
- Petra van Alphen, Elise de Bree, Paula Fikkert, Frank Wijnen:
The role of metrical stress in comprehension and production in dutch children at-risk of dyslexia.
2313-2316
- Seiichi Nakagawa, Kei Ohta:
A statistical method of evaluating pronunciation proficiency for presentation in English.
2317-2320
- Akiyo Joto, Yoshiki Nagase, Seiya Funatsu:
The intelligibility and its relations to acoustic characteristics of English /s/ and /esh/ produced by native speakers of Japanese.
2321-2324
- Martijn Goudbeek, Daniel Swingley, Keith R. Kluender:
The limits of multidimensional category learning.
2325-2328
- Maria Uther, James Uther, Panos Athanasopoulos, Pushpendra Singh, Reiko Akahane-Yamada:
Mobile adaptive CALL (MAC): a lightweight speech-based intervention for mobile language learners.
2329-2332
- Catherine T. Best, Pierre A. Hallé, Jennifer S. Pardo:
English and French speakers' perception of voicing distinctions in non-native lateral consonant syllable onsets.
2333-2336
- Francisco Lacerda, Lisa Gustavsson:
Predicting the consequences of vocalizations in early infancy.
2337-2340
- David Weenink, Guangqin Chen, Zongyan Chen, Stefan de Konink, Dennis Vierkant, Eveline van Hagen, R. J. J. H. van Son:
Learning tone distinctions for Mandarin Chinese.
2341-2344
- Catherine Lai, Kyle Gorman, Jiahong Yuan, Mark Liberman:
Perception of disfluency: language differences and listener bias.
2345-2348
Language Modeling I,
II
- Hiroki Yamazaki, Koji Iwano, Koichi Shinoda, Sadaoki Furui, Haruo Yokota:
Dynamic language model adaptation using presentation slides for lecture speech recognition.
2349-2352
- Cosmin Munteanu, Gerald Penn, Ronald Baecker:
Web-based language modelling for automatic lecture transcription.
2353-2356
- Tanel Alumäe, Toomas Kirt:
LSA-based language model adaptation for highly inflected languages.
2357-2360
- Aaron Heidel, Hung-an Chang, Lin-Shan Lee:
Language model adaptation using latent dirichlet allocation and an efficient topic inference algorithm.
2361-2364
- Sibel Yaman, Jen-Tzung Chien, Chin-Hui Lee:
Structural Bayesian language modeling and adaptation.
2365-2368
- Ciro Martins, António J. S. Teixeira, João Paulo Neto:
Vocabulary selection for a broadcast news transcription system using a morpho-syntactic approach.
2369-2372
- Nguyen Bach, Mohamed Noamany, Ian R. Lane, Tanja Schultz:
Handling OOV words in Arabic ASR via flexible morphological constraints.
2373-2376
- Raquel Justo, M. Inés Torres:
Phrases in category-based language models for Spanish and basque ASR.
2377-2380
- Ebru Arisoy, Hasim Sak, Murat Saraclar:
Language modeling for automatic turkish broadcast news transcription.
2381-2384
Spoken Data Retrieval I,
II
- Roy Wallace, Robbie Vogt, Sridha Sridharan:
A phonetic search approach to the 2006 NIST spoken term detection evaluation.
2385-2388
- Yoshiaki Itoh, Kohei Iwata, Kazunori Kojima, Masaaki Ishigame, Kazuyo Tanaka, Shi-wook Lee:
An integration method of retrieval results using plural subword models for vocabulary-free spoken document retrieval.
2389-2392
- Dimitra Vergyri, Izhak Shafran, Andreas Stolcke, Venkata Ramana Rao Gadde, Murat Akbacak, Brian Roark, Wen Wang:
The SRI/OGI 2006 spoken term detection system.
2393-2396
- Masataka Goto, Jun Ogata, Kouichirou Eto:
Podcastle: a web 2.0 approach to speech recognition research.
2397-2400
- Nathalie Camelin, Frédéric Béchet, Géraldine Damnati, Renato de Mori:
Speech mining in noisy audio message corpus.
2401-2404
- Jian Shao, Qingwei Zhao, Pengyuan Zhang, Zhaojie Liu, Yonghong Yan:
A fast fuzzy keyword spotting algorithm based on syllable confusion network.
2405-2408
- Wooil Kim, John H. L. Hansen:
Advances in speechfind: transcript reliability estimation employing confidence measure based on discriminative sub-word model for SDR.
2409-2412
- Benoît Favre, Jean-François Bonastre, Patrice Bellot:
An interactive timeline for speech database browsing.
2413-2416
Novel Techniques for the NATO Non-native Air-traffic Control and HIWIRE Cockpit Databases
- Stéphane Pigeon, Wade Shen, Aaron D. Lawson, David A. van Leeuwen:
Design and characterization of the non-native military air traffic communications database (nnMATC).
2417-2420
- Wade Shen, Douglas A. Reynolds:
A comparison of speaker clustering and speech recognition techniques for air situational awareness.
2421-2424
- Dimitrios Dimitriadis, José C. Segura, Luz García, Alexandros Potamianos, Petros Maragos, Vassilis Pitsikalis:
Advanced front-end for robust speech recognition in extremely adverse environments.
2425-2428
- Roberto Gemello, Franco Mana, Stefano Scanzio:
Experiments on hiwire database using denoising and adaptation with a hybrid HMM-ANN model.
2429-2432
- Brett Y. Smolenski:
Detection and removal of switching noise in push-to-talk and voice operated exchange communications systems.
2433-2436
- Luis Buera, Antonio Miguel, Oscar Saz, Eduardo Lleida, Alfonso Ortega:
Evaluation of the combined use of MEMLIN and MLLR on the non-native adaptation task of hiwire project database.
2437-2440
Systems for Spoken Language Translation I,
II
- Daniel Déchelotte, Holger Schwenk, Gilles Adda, Jean-Luc Gauvain:
Improved machine translation of speech-to-text outputs.
2441-2444
- Shirin Saleem, Krishna Subramanian, Rohit Prasad, David Stallard, Chia-Lin Kao, Prem Natarajan, R. Suleiman:
Improvements in machine translation for English/iraqi speech translation.
2445-2448
- Evgeny Matusov, Dustin Hillard, Mathew Magimai-Doss, Dilek Z. Hakkani-Tür, Mari Ostendorf, Hermann Ney:
Improving speech translation with automatic boundary prediction.
2449-2452
- Roldano Cattoni, Nicola Bertoldi, Marcello Federico:
Punctuating confusion networks for speech translation.
2453-2456
- Aarthi Reddy, Richard C. Rose, Alain Désilets:
Integration of ASR and machine translation models in a document translation task.
2457-2460
- Yik-Cheung Tam, Tanja Schultz:
Bilingual LSA-based translation lexicon adaptation for spoken language translation.
2461-2464
Articulatory Features
Wideband Speech Processing
- Amr H. Nour-Eldin, Peter Kabal:
Objective analysis of the effect of memory inclusion on bandwidth extension of narrowband speech.
2489-2492
- Bernd Geiser, Hervé Taddei, Peter Vary:
Artificial bandwidth extension without side information for ITU-t g.729.1.
2493-2496
- Hannu Pulakka, Paavo Alku, Laura Laaksonen, Päivi Valve:
The effect of highband harmonic structure in the artificial bandwidth expansion of telephone speech.
2497-2500
- Shingo Kuroiwa, Masashi Takashina, Satoru Tsuge, Fuji Ren:
Artificial bandwidth extension for speech signals using speech recogniton.
2501-2504
- Driss Guerchi, Tamer F. Rabie, Abdelrhani Louzi:
Voicing-based codebook in low-rate wideband CELP coding.
2505-2508
- Ethan R. Duni, Bhaskar D. Rao:
Performance of speaker-dependent wideband speech coding.
2509-2512
Accessibility Issues
- Philippe Dreuw, David Rybach, Thomas Deselaers, Morteza Zahedi, Hermann Ney:
Speech recognition techniques for a sign language recognition system.
2513-2516
- Keigo Nakamura, Tomoki Toda, Hiroshi Saruwatari, Kiyohiro Shikano:
Impact of various small sound source signals on voice conversion accuracy in speech communication aid for laryngectomees.
2517-2520
- Petr Cerva, Jan Nouza:
Design and development of voice controlled aids for motor-handicapped persons.
2521-2524
- Kouichi Katsurada, Yuji Okuma, Makoto Yano, Yurie Iribe, Tsuneo Nitta:
Management of static/dynamic properties in a multimodal interaction system.
2525-2528
- Rubén San Segundo, Alicia Pérez, Daniel Ortiz, Luis Fernando D'Haro, M. Inés Torres, Francisco Casacuberta:
Evaluation of alternatives on speech to sign language translation.
2529-2532
- Géza Németh, Gábor Olaszy, Mátyás Bartalis, Géza Kiss, Csaba Zainkó, Péter Mihajlik:
Speech based drug information system for aged and visually impaired persons.
2533-2536
- Waldo Nogueira Vazquez, Tamás Harczos, Bernd Edler, Jörn Ostermann, Andreas Büchner:
Automatic speech recognition with a cochlear implant front-end.
2537-2540
- Soo-Young Suk, Hiroaki Kojima:
Voice activated powered wheelchair with non-voice rejection algorithm.
2541-2544
- Laurianne Sitbon, Patrice Bellot, Philippe Blache:
Phonetic based sentence level rewriting of questions typed by dyslexic spellers in an information retrieval context.
2545-2548
New Application Areas
- André Berton, Peter Regel-Brietzmann, Hans Ulrich Block, Stefanie Schachtl, Manfred Gehrke:
How to integrate speech-operated internet information dialogs into a car.
2549-2552
- James R. Glass, Timothy J. Hazen, D. Scott Cyphers, Igor Malioutov, David Huynh, Regina Barzilay:
Recent progress in the MIT spoken lecture processing project.
2553-2556
- Philipp Fischer, Andreas Österle, André Berton, Peter Regel-Brietzmann:
How to personalize speech applications for web-based information in a car.
2557-2560
- Satoshi Ikeda, Kazunori Komatani, Tetsuya Ogata, Hiroshi G. Okuno:
Topic estimation with domain extensibility for guiding user's out-of-grammar utterances in multi-domain spoken dialogue systems.
2561-2564
- Ryota Nishimura, Norihide Kitaoka, Seiichi Nakagawa:
Prosody change and response timing analysis in spontaneously spoken dialogs and their modeling in a spoken dialog system.
2565-2568
- Satoshi Tamura, Kunihiko Takamatsu, Shinji Ogura, Satoru Hayamizu:
GEMSIS - a novel application of speech recognition to emergency and disaster medicine.
2569-2572
- Rachel Coulston, Esther Klabbers, Jacques de Villiers, John-Paul Hosom:
Application of speech technology in a home based assessment kiosk for early detection of alzheimer's disease.
2573-2576
- Olga Vybornova, Monica Gemo, Ronald Moncarey, Benoit M. Macq:
Ontology-based multimodal high level fusion involving natural language analysis for aged people home care application.
2577-2580
Story Segmentation
- Shing-kai Chan, Lei Xie, Helen M. Meng:
Modeling the statistical behavior of lexical chains to capture word cohesiveness for automatic story segmentation.
2581-2584
- James G. Fung, Dilek Z. Hakkani-Tür, Mathew Magimai-Doss, Elizabeth Shriberg, Sébastien Cuendet, Nikki Mirghafori:
Cross-linguistic analysis of prosodic features for sentence segmentation.
2585-2588
- Andrew Rosenberg, Mehrbod Sharifi, Julia Hirschberg:
Varying input segmentation for story boundary detection in English, Arabic and Mandarin broadcast news.
2589-2592
- BalaKrishna Kolluru, Yoshihiko Gotoh:
Speaker role based structural classification of broadcast news stories.
2593-2596
Systems for LVCSR and Rich Transcription I,
II
- Ümit Güz, Sébastien Cuendet, Dilek Z. Hakkani-Tür, Gökhan Tür:
Co-training using prosodic and lexical information for sentence segmentation.
2597-2600
- Yannick Estève, Sylvain Meignier, Paul Deléglise, Julie Mauclair:
Extracting true speaker identities from transcriptions.
2601-2604
- Rong Fu, Ian D. Benest:
An improved speaker diarization system.
2605-2608
- Sebastian Stüker, Christian Fügen, Florian Kraft, Matthias Wölfel:
The ISL 2007 English speech transcription system for european parliament speeches.
2609-2612
- Mei-Yuh Hwang, Wen Wang, Xin Lei, Jing Zheng, Özgür Çetin, Gang Peng:
Advances in Mandarin broadcast speech recognition.
2613-2616
- Jun Ogata, Masataka Goto, Kouichirou Eto:
Automatic transcription for a web 2.0 service to search podcasts.
2617-2620
Prosody:
Production
- Matthias Jilka, Bernd Möbius:
The influence of vowel quality features on peak alignment.
2621-2624
- Yen-Liang Shue, Markus Iseli, Nanette Veilleux, Abeer Alwan:
Pitch accent versus lexical stress: quantifying acoustic measures related to the voice source.
2625-2628
- Stefan Benus, Agustín Gravano, Julia Hirschberg:
Prosody, emotions, and... 'whatever'.
2629-2632
- Wentao Gu, Rerrario Shui-Ching Ho, Tan Lee:
Modeling tones in hakka on the basis of the command-response model.
2633-2636
- Gerrit Kentner:
Length, ordering preference and intonational phrasing: evidence from pauses.
2637-2640
- Jörg Peters, Judith Hanssen, Carlos Gussenhoven:
Alignment of the second low target in dutch falling-rising pitch contours.
2641-2644
- Helena Moniz, Ana Isabel Mata, Céu Viana:
On filled-pauses and prolongations in european portuguese.
2645-2648
Prosody:
Perception
- Michael Olsberg, Yi Xu, Jeremy Green:
Dependence of tone perception on syllable perception.
2649-2652
- Ralf Winkler:
Testing the relevance of speech rate, pitch and a glottal Chink for the perception of age in synthesized speech using formant synthesis.
2653-2656
- Tamás Böhm, Stefanie Shattuck-Hufnagel:
Utterance-final glottalization as a cue for familiar speaker recognition.
2657-2660
- Chun-Fang Huang, Masato Akagi:
A rule-based speech morphing for verifying a expressive speech perception model.
2661-2664
- Elina Helander, Jani Nurminen:
On the importance of pure prosody in the perception of speaker identity.
2665-2668
- Shi-Han Chen, Chih-Chung Kuo:
Perceptual relevance of pitch contours of Mandarin tones and its efficacy in prosody generation of speech synthesis.
2669-2672
- Hiromitsu Nishizaki, Mitsuhiro Somiya, Kenji Kobayashi, Yoshihiro Sekiguchi:
The effect of filled pauses in a lecture speech on impressive evaluation of listeners.
2673-2676
- Yujia Li, Tan Lee:
Perceptual equivalence of approximated Cantonese tone contours.
2677-2680
- Suleman Shahid, Emiel Krahmer, Marc Swerts:
Audiovisual emotional speech of game playing children: effects of age and culture.
2681-2684
Machine Learning for Spoken Dialog Systems
Spoken Dialog Systems I,
II
- Dong Yu, Yun-Cheng Ju, Ye-Yi Wang, Geoffrey Zweig, Alex Acero:
Automated directory assistance system - from theory to practice.
2709-2712
- Geoffrey Zweig, Patrick Nguyen, Yun-Cheng Ju, Ye-Yi Wang, Dong Yu, Alex Acero:
The voice-rate dialog system for consumer ratings.
2713-2716
- Andi Winterboer, Jiang Hu, Johanna D. Moore, Clifford Nass:
The influence of user tailoring and cognitive load on user performance in spoken dialogue systems.
2717-2720
- Ye-Yi Wang, Dong Yu, Yun-Cheng Ju, Geoffrey Zweig, Alex Acero:
Confidence measures for voice search applications.
2721-2724
- Ryuichiro Higashinaka, Kohji Dohsaka, Shigeaki Amano, Hideki Isozaki:
Effects of quiz-style information presentation on user understanding.
2725-2728
- Hong-Kwang Jeff Kuo, Vaibhava Goel:
A data visualization and analysis method for natural language call routing system design.
2729-2732
Phonetics
- Christiane Ulbrich, Horst Ulbrich:
Realisations and alternations in German /r/-realisation.
2733-2736
- Christopher S. Doty, Kaori Idemaru, Susan G. Guion:
Singleton and geminate stops in Finnish - acoustic correlates.
2737-2740
- Christophe Van Bael, R. Harald Baayen, Helmer Strik:
Segment deletion in spontaneous speech: a corpus study using mixed effects models with crossed random effects.
2741-2744
- Hongying Zheng, Peter W. M. Tsang, William S.-Y. Wang:
Categorical perception of Cantonese tones in context: a cross-linguistic study.
2745-2748
- Yiya Chen, Jiahong Yuan:
A corpus study of the 3rd tone sandhi in standard Chinese.
2749-2752
- Jonathan Harrington, Sallyanne Palethorpe, Catherine I. Watson:
Age-related changes in fundamental frequency and formants: a longitudinal study of four speakers.
2753-2756
Pitch Extraction I,
II
- Jasha Droppo, Alex Acero:
A fine pitch model for speech.
2757-2760
- Prasanta Kumar Ghosh, Antonio Ortega, Shrikanth S. Narayanan:
Pitch period estimation using multipulse model and wavelet transform.
2761-2764
- Martin Heckmann, Frank Joublin, Christian Goerick:
Combining rate and place information for robust pitch extraction.
2765-2768
- Heidi Christensen, Ning Ma, Stuart N. Wrigley, Jon Barker:
Integrating pitch and localisation cues at a speech fragment level.
2769-2772
- Jean-Sylvain Liénard, François Signol, Claude Barras:
Speech fundamental frequency estimation using the alternate comb.
2773-2776
- Andrew Rosenberg, Julia Hirschberg:
Detecting pitch accent using pitch-corrected energy-based predictors.
2777-2780
Spoken Language Understanding and Summarization
- Jian Zhang, Ricky Ho Yin Chan, Pascale Fung, Lu Cao:
A comparative study on speech summarization of broadcast news and lecture speech.
2781-2784
- Gabriel Murray, Steve Renals:
Towards online speech summarization.
2785-2788
- Tomoyuki Yamagata, Atsushi Sako, Tetsuya Takiguchi, Yasuo Ariki:
System request detection in conversation based on acoustic and speaker alternation features.
2789-2792
- Michael Levit, Elizabeth Boschee, Marjorie Freedman:
Selecting on-topic sentences from natural language corpora.
2793-2796
- Seokhwan Kim, Minwoo Jeong, Gary Geunbae Lee:
A semi-supervised method for efficient construction of statistical spoken language understanding resources.
2797-2800
- Yasuhisa Fujii, Norihide Kitaoka, Seiichi Nakagawa:
Automatic extraction of cue phrases for important sentences in lecture speech and automatic lecture speech summarization.
2801-2804
- Yi-Ting Chen, Hsuan-Sheng Chiu, Hsin-Min Wang, Berlin Chen:
A unified probabilistic generative framework for extractive spoken document summarization.
2805-2808
- Matthieu Hébert:
Generic class-based statistical language models for robust speech understanding in directed dialog applications.
2809-2812
- Michael L. Seltzer, Yun-Cheng Ju, Ivan Tashev, Alex Acero:
Robust location understanding in spoken dialog systems using intersections.
2813-2816
Systems for Spoken Language Translation I,
II
- David Stallard, Fred Choi, Chia-Lin Kao, Kriste Krstovski, Premkumar Natarajan, Rohit Prasad, Shirin Saleem, Krishna Subramanian:
The BBN 2007 displayless English/iraqi speech-to-speech translation system.
2817-2820
- Ruhi Sarikaya, Yonggang Deng, Yuqing Gao:
Context dependent word modeling for statistical machine translation using part-of-speech tags.
2821-2824
- Darren Scott Appling, Nick Campbell:
Translating conversational speech to standard linguistic form.
2825-2828
- Caroline Lavecchia, Kamel Smaïli, David Langlois, Jean Paul Haton:
Using inter-lingual triggers for machine translation.
2829-2832
- Daniele Falavigna, Nicola Bertoldi, Fabio Brugnara, Roldano Cattoni, Mauro Cettolo, Boxing Chen, Marcello Federico, Diego Giuliani, Roberto Gretter, Deepa Gupta, Dino Seppi:
The IRST English-Spanish translation system for european parliament speeches.
2833-2836
- Christian Fügen, Muntsin Kolss:
The influence of utterance chunking on machine translation performance.
2837-2840
- Kristin Precoda, Jing Zheng, Dimitra Vergyri, Horacio Franco, Colleen Richey, Andreas Kathol, Sachin S. Kajarekar:
Iraqcomm: a next generation translation system.
2841-2844
- Sharath Rao, Ian R. Lane, Tanja Schultz:
Optimizing sentence segmentation for spoken language translation.
2845-2848
Speech Synthesis I,
II
- Suphattharachai Chomphan, Takao Kobayashi:
Implementation and evaluation of an HMM-based Thai speech synthesis system.
2849-2852
- Davide Bonardo, Enrico Zovato:
Speech synthesis enhancement in noisy environments.
2853-2856
- Helmut Schmid, Bernd Möbius, Julia Weidenkaff:
Tagging syllable boundaries with joint n-gram models.
2857-2860
- Jun Xu, Dezhi Huang, Yongxin Wang, Yuan Dong, Lianhong Cai, Haila Wang:
Hierarchical non-uniform unit selection based on prosodic structure.
2861-2864
- Peter Birkholz:
Control of an articulatory speech synthesizer based on dynamic approximation of spatial articulatory targets.
2865-2868
- Nobuyuki Nishizawa, Hisashi Kawai:
A preselection method based on cost degradation from the optimal sequence for concatenative speech synthesis.
2869-2872
- Guntram Strecha, Matthias Eichner, Rüdiger Hoffmann:
Line cepstral quefrencies and their use for acoustic inventory coding.
2873-2876
- Peter Cahill, Daniel Aioanei, Julie Carson-Berndsen:
Articulatory acoustic feature applications in speech synthesis.
2877-2880
- Aleksandra Krul, Géraldine Damnati, François Yvon, Cédric Boidin, Thierry Moudenc:
Approaches for adaptive database reduction for text-to-speech synthesis.
2881-2884
- Richard Tzong-Han Tsai, Hsi-Chuan Hung, Hong-Jie Dai, Wen-Lian Hsu:
Exploiting unlabeled internal data in conditional random fields to reduce word segmentation errors for Chinese texts.
2885-2888
- Barry Kirkpatrick, Darragh O'Brien, Ronan Scaife, Andrew Errity:
On the role of spectral dynamics in unit selection speech synthesis.
2889-2892
- Brian Langner, Alan W. Black:
ugloss: a framework for improving spoken language generation understandability.
2893-2896
- Karl Schnell, Arild Lacroix:
Combination of LSF and pole based parameter interpolation for model-based diphone concatenation.
2897-2900
- Kishore Prahallad, Arthur R. Toth, Alan W. Black:
Automatic building of synthetic voices from large multi-paragraph speech databases.
2901-2904
- Ascensión Gallardo-Antolín, Roberto Barra-Chicote, Marc Schröder, Sacha Krstulovic, Juan Manuel Montero:
Automatic phonetic segmentation of Spanish emotional speech.
2905-2908
- Dacheng Lin, Yong Zhao, Frank K. Soong, Min Chu, Jieyu Zhao:
Iterative unit selection with unnatural prosody detection.
2909-2912
Voice Activity Detection and Sound Classification
- Maria E. Markaki, Michael Wohlmayr, Yannis Stylianou:
Speech-nonspeech discrimination using the information bottleneck method and spectro-temporal modulation index.
2913-2916
- Keun Won Jang, Dong Kook Kim, Joon-Hyuk Chang:
A uniformly most powerful test for statistical model-based voice activity detection.
2917-2920
- John Dines, Jithendra Vepa:
Direct optimisation of a multilayer perceptron for the estimation of cepstral mean and variance statistics.
2921-2924
- Marijn Huijbregts, Chuck Wooters, Roeland Ordelman:
Filtering the unknown: speech activity detection in heterogeneous video collections.
2925-2928
- Abhijeet Sangwan, Nitish Krishnamurthy, John H. L. Hansen:
Environmentally aware voice activity detector.
2929-2932
- Masakiyo Fujimoto, Kentaro Ishizuka:
Noise robust voice activity detection based on switching kalman filter.
2933-2936
- Q-Haing Jo, Yun-Sik Park, Kye-Hwan Lee, Ji-Hyun Song, Joon-Hyuk Chang:
Voice activity detection based on support vector machine using effective feature vectors.
2937-2940
- K. Sri Rama Murty, B. Yegnanarayana, S. Guruprasad:
Voice activity detection in degraded speech using excitation source information.
2941-2944
- David Cournapeau, Tatsuya Kawahara:
Evaluation of real-time voice activity detection based on high order statistics.
2945-2948
- Yanmeng Guo, Qian Qian, Yonghong Yan:
Robust voice activity detection based on adaptive sub-band energy sequence analysis and harmonic detection.
2949-2952
- Corinne Fredouille, Nicholas Evans:
The influence of speech activity detection and overlap on speaker diarization for meeting room recordings.
2953-2956
- Gibak Kim, Nam Ik Cho:
Voice activity detection using the phase vector in microphone array.
2957-2960
- Federico Flego, Christian Zieger, Maurizio Omologo:
Adaptive weighting of microphone arrays for distant-talking F0 and voiced/unvoiced estimation.
2961-2964
- A. Sreenivasa Murthy, S. Chandra Sekhar, Thippur V. Sreenivas:
Robust and high-resolution voiced/unvoiced classification in noisy speech using a signal smoothness criterion.
2965-2968
- Tara N. Sainath, Victor Zue, Dimitri Kanevsky:
Audio classification using extended baum-welch transformations.
2969-2972
- Mary Tai Knox, Nikki Mirghafori:
Automatic laughter detection using neural networks.
2973-2976
- Gang Peng, Mei-Yuh Hwang, Mari Ostendorf:
Automatic acoustic segmentation for speech recognition on broadcast recordings.
2977-2980
Unreviewed Papers for Special Sessions
- Peter Birkholz:
Articulatory synthesis of singing.
4001-4004
- Takeshi Saitou, Masataka Goto, Masashi Unoki, Masato Akagi:
Vocal conversion from speaking voice to singing voice using STRAIGHT.
4005-4006
- Axel Röbel, Joshua Fineberg:
Speech to chant transformation with the phase vocoder.
4007-4008
- Hideki Kenmochi, Hayato Ohshita:
VOCALOID - commercial singing synthesizer based on sample concatenation.
4009-4010
- Nicolas D'Alessandro, Thierry Dutoit:
RAMCESS/handsketch: a multi-representation framework for realtime and expressive singing synthesis.
4011-4012
- Sten Ternström, Johan Sundberg:
Formant-based synthesis of singing.
4013-4014
- Han Sloetjes, Albert Russel, Alexander Klassmann:
ELAN: a free and open-source multimedia annotation tool.
4015-4016
- Jozsef Szakos, Ulrike Glavitsch:
Speechindexer in action: managing endangered Formosan languages.
4017-4019
- Tohru Ifukube, Yasuyuki Shimizu:
A portable record player for wax cylinders using a laser-beam reflection method.
4020
Last update Fri May 25 08:23:11 2012
CET by the DBLP Team —
Data released under the ODC-BY 1.0 license — See also our legal information page