INTERSPEECH 2004: Lisbon, Portugal
INTERSPEECH 2004 - ICSLP, 8th International Conference on Spoken Language Processing, Jeju Island, Korea, October 4-8, 2004. ISCA 2004
Plenary Talks
Chin-Hui Lee: From decoding-driven to detection-based paradigms for automatic speech recognition.
Hyun-Bok Lee: In search of a universal phonetic alphabet - theory and application of an organic visible speech-.
Jacqueline Vaissière: From X-ray or MRU data to sounds through articulatory synthesis: towards an integrated view of the speech communication process.
Speech Recognition - Adaptation
Sreeram Balakrishnan, Karthik Visweswariah, Vaibhava Goel: Stochastic gradient adaptation of front-end parameters.
Antoine Raux, Rita Singh: Maximum - likelihod adaptation of semi-continuous HMMs by latent variable decomposition of state distributions.
Chao Huang, Tao Chen, Eric Chang: Transformation and combination of hiden Markov models for speaker selection training.
Nikos Chatzichrisafis, Vassilios Digalakis, Vassilios Diakoloukas, Costas Harizakis: Rapid acoustic model development using Gaussian mixture clustering and language adaptation.
Karthik Visweswariah, Ramesh A. Gopinath: Adaptation of front end parameters in a speech recognizer.
Diego Giuliani, Matteo Gerosa, Fabio Brugnara: Speaker normalization through constrained MLLR based transforms.
Xiangyu Mu, Shuwu Zhang, Bo Xu: Multi-layer structure MLLR adaptation algorithm with subspace regression classes and tying.
Georg Stemmer, Stefan Steidl, Christian Hacker, Elmar Nöth: Adaptation in the pronunciation space for non-native speech recognition.
Xuechuan Wang, Douglas D. O'Shaughnessy: Robust ASR model adaptation by feature-based statistical data mapping.
Brian Mak, Simon Ho, James T. Kwok: Speedup of kernel eigenvoice speaker adaptation by embedded kernel PCA.
Hyung Bae Jeon, Dong Kook Kim: Maximum a posteriori eigenvoice speaker adaptation for Korean connected digit recognition.
Koji Tanaka, Fuji Ren, Shingo Kuroiwa, Satoru Tsuge: Acoustic model adaptation for coded speech using synthetic speech.
Motoyuki Suzuki, Hirokazu Ogasawara, Akinori Ito, Yuichi Ohkawa, Shozo Makino: Speaker adaptation method for CALL system using bilingual speakers' utterances.
Shinji Watanabe: Acoustic model adaptation based on coarse/fine training of transfer vectors and its application to a speaker adaptation task.
Wei-Ho Tsai, Shih-Sian Cheng, Hsin-Min Wang: Speaker clustering of speech utterances using a voice characteristic reference space.
Young Kuk Kim, Hwa Jeon Song, Hyung Soon Kim: Performance improvement of connected digit recognition using unsupervised fast speaker adaptation.
Hyung Soon Kim, Hwa Jeon Song: Simultaneous estimation of weights of eigenvoices and bias compensation vector for rapid speaker adaptation.
Matthias Wölfel: Speaker dependent model order selection of spectral envelopes.
Enrico Bocchieri, Michael Riley, Murat Saraclar: Methods for task adaptation of acoustic models with limited transcribed in-domain data.
Atsushi Fujii, Tetsuya Ishikawa, Katsunobu Itou, Tomoyosi Akiba: Unsupervised topic adaptation for lecture speech retrieval.
Haibin Liu, Zhenyang Wu: Mean and covariance adaptation based on minimum classification error linear regression for continuous density HMMs.
Goshu Nagino, Makoto Shozakai: Design of ready-made acoustic model library by two-dimensional visualization of acoustic space.
Spoken Language Identification, Translation and Retrieval I

Mark Huckvale: ACCDIST: a metric for comparing speakers' accents.
Michael Levit, Allen L. Gorin, Patrick Haffner, Hiyan Alshawi, Elmar Nöth: Aspects of named entity processing.
Josep Maria Crego, José B. Mariño, Adrià de Gispert: Finite-state-based and phrase-based statistical machine translation.
Tanja Schultz, Szu-Chen Stan Jou, Stephan Vogel, Shirin Saleem: Using word latice information for a tighter coupling in speech translation systems.
Teruhisa Misu, Tatsuya Kawahara, Kazunori Komatani: Confirmation strategy for document retrieval systems with spoken dialog interface.
Shi-wook Lee, Kazuyo Tanaka, Yoshiaki Itoh: Multilayer subword units for open-vocabulary spoken document retrieval.
Yoshiaki Itoh, Kazuyo Tanaka, Shi-wook Lee: An efficient partial matching algorithm toward speech retrieval by speech.
Celestin Sedogbo, Sébastien Herry, Bruno Gas, Jean-Luc Zarader: Language detection by neural discrimination.
Ricardo de Córdoba, Javier Ferreiros, Valentín Sama, Javier Macías Guarasa, Luis Fernando D'Haro, Fernando Fernández: Language identification techniques based on full recognition in an air traffic control task.
John H. L. Hansen, Umit H. Yapanel, Rongqing Huang, Ayako Ikeno: Dialect analysis and modeling for automatic classification.
Pascale Fung, Yi Liu, Yongsheng Yang, Yihai Shen, Dekai Wu: A grammar-based Chinese to English speech translation system for portable devices.
Gökhan Tür: Cost-sensitive call classification.
Mikko Kurimo, Ville T. Turunen, Inger Ekman: An evaluation of a spoken document retrieval baseline system in finish.
Hui Jiang, Pengfei Liu, Imed Zitouni: Discriminative training of naive Bayes classifiers for natural language call routing.
Nicolas Moreau, Hyoung-Gook Kim, Thomas Sikora: Phonetic confusion based document expansion for spoken document retrieval.
Euisok Chung, Soojong Lim, Yi-Gyu Hwang, Myung-Gil Jang: Hybrid named entity recognition for question-answering system.
Eric Sanders, Febe de Wet: Histogram normalisation and the recognition of names and ontology words in the MUMIS project.
Rui Amaral, Isabel Trancoso: Improving the topic indexation and segmentation modules of a media watch system.
Melissa Barkat-Defradas, Rym Hamdi, Emmanuel Ferragne, François Pellegrino: Speech timing and rhythmic structure in arabic dialects: a comparison of two approaches.
Jen-Wei Kuo, Yao-Min Huang, Berlin Chen, Hsin-Min Wang: Statistical Chinese spoken document retrieval using latent topical information.
Masahiko Matsushita, Hiromitsu Nishizaki, Seiichi Nakagawa, Takehito Utsuro: Keyword recognition and extraction by multiple-LVCSRs with 60, 000 words in speech-driven WEB retrieval task.
Ruiqiang Zhang, Gen-ichiro Kikui, Hirofumi Yamamoto, Frank K. Soong, Taro Watanabe, Eiichiro Sumita, Wai Kit Lo: Improved spoken language translation using n-best speech recognition hypotheses.
Bowen Zhou, Daniel Déchelotte, Yuqing Gao: Two-way speech-to-speech translation on handheld devices.
Hervé Blanchon: HLT modules scalability within the NESPOLE! project.
Linguistics, Phonology, and Phonetics
Midam Kim: Correlation between VOT and F0 in the perception of Korean stops and affricates.
Aude Noiray, Lucie Ménard, Marie-Agnès Cathiard, Christian Abry, Christophe Savariaux: The development of anticipatory labial coarticulation in French: a pionering study.
Melvyn John Hunt: Speech recognition, sylabification and statistical phonetics.
Jilei Tian: Data-driven approaches for automatic detection of syllable boundaries.
Anne Cutler, Dennis Norris, Núria Sebastián-Gallés: Phonemic repertoire and similarity within the vocabulary.

Jong-Pyo Lee, Tae-Yeoub Jang: A comparative study on the production of inter-stress intervals of English speech by English native speakers and Korean speakers.
Emi Zuiki Murano, Mihoko Teshigawara: Articulatory correlates of voice qualities of god guys and bad guys in Japanese anime: an MRI study.
Sorin Dusan: Effects of phonetic contexts on the duration of phonetic segments in fluent read speech.
Qiang Fang: A study on nasal coda los in continuous speech.
Hua-Li Jian: An improved pair-wise variability index for comparing the timing characteristics of speech.
Hua-Li Jian: An acoustic study of speech rhythm in taiwan English.
Sung-A. Kim: Language specific phonetic rules: evidence from domain-initial strengthening.
Hansang Park: Spectral characteristics of the release bursts in Korean alveolar stops.
Rob van Son, Olga Bolotova, Louis C. W. Pols, Mietta Lennes: Frequency effects on vowel reduction in three typologically different languages (dutch, finish, Russian).
Sunhee Kim: Phonology of exceptions for for Korean grapheme-to-phoneme conversion.
Shigeyoshi Kitazawa, Shinya Kiriyama: Acoustic and prosodic analysis of Japanese vowel-vowel hiatus with laryngeal effect.
Kimiko Tsukada: A cross-linguistic acoustic comparison of unreleased word-final stops: Korean and Thai.
Taehong Cho, Elizabeth K. Johnson: Acoustic correlates of phrase-internal lexical boundaries in dutch.
Taehong Cho, James M. McQueen: Phonotactics vs. phonetic cues in native and non-native listening: dutch and Korean listeners' perception of dutch and English.
Svetlana Kaminskaia, François Poiré: Comparing intonation of two varieties of French using normalized F0 values.
H. Timothy Bunnell, James B. Polikoff, Jane McNicholas: Spectral moment vs. bark cepstral analysis of children's word-initial voiceles stops.
Nobuaki Minematsu: Pronunciation assessment based upon the compatibility between a learner's pronunciation structure and the target language's lexical structure.
Kenji Yoshida: Spread of high tone in akita Japanese.
Biomedical Applications of Speech Analysis
Juan Ignacio Godino-Llorente, María Victoria Rodellar Biarge, Pedro Gómez Vilda, Francisco Díaz Pérez, Agustín Álvarez Marquina, Rafael Martínez-Olalla: Biomechanical parameter fingerprint in the mucosal wave power spectral density.
Cheolwoo Jo, Soo-Geon Wang, Byung-Gon Yang, Hyung-Soon Kim, Tao Li: Classification of pathological voice including severely noisy cases.
Hiroki Mori, Yasunori Kobayashi, Hideki Kasuya, Hajime Hirose, Noriko Kobayashi: F0 and formant frequency distribution of dysarthric speech - a comparative study.
Hideki Kawahara, Yumi Hirachi, Masanori Morise, Hideki Banno: Procedure "senza vibrato": a key component for morphing singing.
Claudia Manfredi, Giorgio Peretti, Laura Magnoni, Fabrizio Dori, Ernesto Iadanza: Thyroplastic medialisation in unilateral vocal fold paralysis: assessing voice quality recovering.
Jong Min Choi, Myung-Whun Sung, Kwang Suk Park, Jeong-Hun Hah: A comparison of the perturbation analysis between PRAAT and computerize speech lab.
Robust Speech Recognition on AURORA

Hugo Van hamme: PROSPECT features and their application to missing data techniques for robust speech recognition.
Hugo Van hamme, Patrick Wambacq, Veronique Stouten: Accounting for the uncertainty of speech estimates in the context of model-based feature enhancement.
Hans-Günter Hirsch, Harald Finster: Applying the Aurora feature extraction schemes to a phoneme based recognition task.
Zhipeng Zhang, Tomoyuki Ohya, Sadaoki Furui: Evaluation of tree-structured piecewise linear transformation-based noise adaptation on AURORA2 database.
Tor André Myrvoll, Satoshi Nakamura: Online minimum mean square error filtering of noisy cepstral coefficients using a sequential EM algorithm.
Akira Sasou, Kazuyo Tanaka, Satoshi Nakamura, Futoshi Asano: HMM-based feature compensation method: an evaluation using the AURORA2.
Xuechuan Wang, Douglas D. O'Shaughnessy: Noise adaptation for robust AURORA 2 noisy digit recognition using statistical data mapping.
Benjamin J. Shannon, Kuldip K. Paliwal: MFCC computation from magnitude spectrum of higher lag autocorrelation coefficients for robust speech recognition.
Muhammad Ghulam, Takashi Fukuda, Junsei Horikawa, Tsuneo Nitta: A noise-robust feature extraction method based on pitch-synchronous ZCPA for ASR.
José C. Segura, Ángel de la Torre, Javier Ramírez, Antonio J. Rubio, M. Carmen Benítez: Including uncertainty of speech observations in robust speech recognition.
Takeshi Yamada, Jiro Okada, Nobuhiko Kitawaki: Integration of n-best recognition results obtained by multiple noise reduction algorithms.
Panji Setiawan, Sorel Stan, Tim Fingscheidt: Revisiting some model-based and data-driven denoising algorithms in Aurora 2 context.
Guo-Hong Ding, Bo Xu: Exploring high-performance speech recognition in noisy environments using high-order taylor series expansion.
Siu Wa Lee, Pak-Chung Ching: In-phase feature induction: an effective compensation technique for robust speech recognition.
Jeff Siu-Kei Au-Yeung, Man-Hung Siu: Improved performance of Aurora 4 using HTK and unsupervised MLLR adaptation.
Shang-nien Tsai, Lin-Shan Lee: A new feature extraction front-end for robust speech recognition using progressive histogram equalization and multi-eigenvector temporal filtering.
Spoken / Multimodal Dialogue System
Christian Fügen, Hartwig Holzapfel, Alex Waibel: Tight coupling of speech recognition and dialog management - dialog-context dependent grammar weighting for speech recognition.
Akinobu Lee, Keisuke Nakamura, Ryuichi Nisimura, Hiroshi Saruwatari, Kiyohiro Shikano: Noise robust real world spoken dialogue system using GMM based rejection of unintended inputs.
Hironori Oshikawa, Norihide Kitaoka, Seiichi Nakagawa: Speech interface for name input based on combination of recognition methods using syllable-based n-gram and word dictionary.
Imed Zitouni, Minkyu Lee, Hui Jiang: Constrained minimization technique for topic identification using discriminative training and support vector machines.
Takashi Konashi, Motoyuki Suzuki, Akinori Ito, Shozo Makino: A spoken dialog system based on automatic grammar generation and template-based weighting for autonomous mobile robots.
Akinori Ito, Takanobu Oba, Takashi Konashi, Motoyuki Suzuki, Shozo Makino: Noise adaptive spoken dialog system based on selection of multiple dialog strategies.
Mikko Hartikainen, Markku Turunen, Jaakko Hakulinen, Esa-Pekka Salonen, J. Adam Funk: Flexible dialogue management using distributed and dynamic dialogue control.
Keith Houck: Contextual revision in information seeking conversation systems.
Ian M. O'Neill, Philip Hanna, Xingkun Liu, Michael F. McTear: Cross domain dialogue modelling: an object-based approach.
Hirohiko Sagawa, Teruko Mitamura, Eric Nyberg: A comparison of confirmation styles for error handling in a speech dialog system.
Fan Yang, Peter A. Heeman, Kristy Hollingshead: Towards understanding mixed-initiative in task-oriented dialogues.
Peter Wolf, Joseph Woelfel, Jan van Gemert, Bhiksha Raj, David Wong: Spokenquery: an alternate approach to chosing items with speech.
Shona Douglas, Deepak Agarwal, Tirso Alonso, Robert M. Bell, Mazin G. Rahim, Deborah F. Swayne, Chris Volinsky: Mining customer care dialogs for "daily news".
Jens Edlund, Gabriel Skantze, Rolf Carlson: Higgins - a spoken dialogue system for investigating error handling techniques.
Fuliang Weng, Lawrence Cavedon, Badri Raghunathan, Danilo Mirkovic, Hua Cheng, Hauke Schmidt, Harry Bratt, Rohit Mishra, Stanley Peters, Sandra Upson, Elizabeth Shriberg, Carsten Bergmann, Lin Zhao: A conversational dialogue system for cognitively overloaded users.
Matthew N. Stuttle, Jason D. Williams, Steve Young: A framework for dialogue data collection with a simulated ASR channel.
Shimei Pan: A multi-layer conversation management approach for information seeking applications.
Keita Hayashi, Yuki Irie, Yukiko Yamaguchi, Shigeki Matsubara, Nobuo Kawaguchi: Speech understanding, dialogue management and response generation in corpus-based spoken dialogue system.
Fernando Fernández, Valentín Sama, Luis Fernando D'Haro, Rubén San Segundo, Ricardo de Córdoba, Juan Manuel Montero: Implementation of dialog applications in an open-source voiceXML platform.
Chun Wai Lau, Bin Ma, Helen Mei-Ling Meng, Yiu-Sang Moon, Yeung Yam: Fuzzy logic decision fusion in a multimodal biometric system.
Peter Poller, Norbert Reithinger: A state model for the realization of visual perceptive feedback in smartkom.
Akemi Iida, Yoshito Ueno, Ryohei Matsuura, Kiyoaki Aikawa: A vector-based method for efficiently representing multivariate environmental information.
Ioannis Toptsis, Shuyin Li, Britta Wrede, Gernot A. Fink: A multi-modal dialog system for a mobile robot.
Niels Ole Bernsen, Laila Dybkjær: Structured interview-based evaluation of spoken multimodal conversation with h.c. andersen.
Speech Recognition - Search
Miroslav Novak, Vladimír Bergl: Memory efficient decoding graph compilation with wide cross-word acoustic context.
Takaaki Hori, Chiori Hori, Yasuhiro Minami: Fast on-the-fly composition for weighted finite-state transducers in 1.8 million-word vocabulary continuous speech recognition.
Peng Yu, Frank Torsten Bernd Seide: A hybrid word / phoneme-based approach for improved vocabulary-independent search in spontaneous speech.
Frédéric Tendeau: Optimizing an engine network that allows dynamic masking.
Spoken Dialogue and Systems
Katsutoshi Ohtsuki, Nobuaki Hiroshima, Yoshihiko Hayashi, Katsuji Bessho, Shoichi Matsunaga: Topic structure extraction for meeting indexing.
Gina-Anne Levow: Identifying local corrections in human-computer dialogue.
Peter Reichl, Florian Hammer: Hot discussion or frosty dialogue? towards a temperature metric for conversational interactivity.
Stephanie Seneff, Chao Wang, I. Lee Hetherington, Grace Chung: A dynamic vocabulary spoken dialogue interface.
Matthias Denecke, Kohji Dohsaka, Mikio Nakano: Learning dialogue policies using state aggregation in reinforcement learning.
Speech Perception
Keren B. Shatzman: Segmenting ambiguous phrases using phoneme duration.
Shuichi Sakamoto, Yôiti Suzuki, Shigeaki Amano, Tadahisa Kondo, Naoki Iwaoka: A compensation method for word-familiarity difference with SNR control in intelligibility test.
Takashi Otake, Yoko Sakamoto, Yasuyuki Konomi: Phoneme-based word activation in spoken-word recognition: evidence from Japanese school children.
Belynda Brahimi, Philippe Boula de Mareüil, Cédric Gendrot: Role of segmental and suprasegmental cues in the perception of maghrebian-acented French.
Hiroaki Kato, Yoshinori Sagisaka, Minoru Tsuzaki, Makiko Muto: Effect of speaking rate on the acceptability of change in segment duration.
Kiyoko Yoneyama: A cross-linguistic study of diphthongs in spoken word processing in Japanese and English.
Multi-Lingual Speech-to-Speech Translation
Alex Waibel: Speech translation: past, present and future.
Gen-ichiro Kikui, Toshiyuki Takezawa, Seiichi Yamamoto: Multilingual corpora for speech-to-speech translation research.
Hermann Ney: Statistical machine translation and its challenges.
Youngjik Lee, Jun Park, Seung-Shin Oh: Usability considerations of speech-to-speech translation system.
Gianni Lazzari, Alex Waibel, Chengqing Zong: Worldwide ongoing activities on multilingual speech to speech translation.
Speech Recognition - Large Vocabulary
Dominique Fohr, Odile Mella, Christophe Cerisara, Irina Illina: The automatic news transcription system: ANTS, some real time experiments.
Bhuvana Ramabhadran, Olivier Siohan, Geoffrey Zweig: Use of metadata to improve recognition of spontaneous speech and named entities.
Tanel Alumäe: Large vocabulary continuous speech recognition for estonian using morpheme classes.
Zhaobing Han, Shuwu Zhang, Bo Xu: Combining agglomerative and tree-based state clustering for high accuracy acoustic modeling.
William S.-Y. Wang, Gang Peng: Parallel tone score association method for tone language speech recognition.
Jing Zheng, Horacio Franco, Andreas Stolcke: Effective acoustic modeling for rate-of-speech variation in large vocabulary conversational speech recognition.
G. L. Sarada Ghadiyaram, N. Hemalatha Nagarajan, T. Nagarajan Thangavelu, Hema A. Murthy: Automatic transcription of continuous speech using unsupervised and incremental training.
Jan Nouza, Dana Nejedlová, Jindrich Zdánský, Jan Kolorenc: Very large vocabulary speech recognition system for automatic transcription of czech broadcast programs.
Olivier Siohan, Bhuvana Ramabhadran, Geoffrey Zweig: Speech recognition error analysis on the English MALACH corpus.

Xianghua Xu, Qiang Guo, Jie Zhu: Restructuring HMM states for speaker adaptation in Mandarin speech recognition.
Mike Matton, Mathias De Wachter, Dirk Van Compernolle, Ronald Cools: A discriminative locally weighted distance measure for speaker independent template based speech recognition.
Yohei Itaya, Heiga Zen, Yoshihiko Nankaku, Chiyomi Miyajima, Keiichi Tokuda, Tadashi Kitamura: Deterministic annealing EM algorithm in parameter estimation for acoustic model.
Frank K. Soong, Wai Kit Lo, Satoshi Nakamura: Optimal acoustic and language model weights for minimizing word verification errors.
Atsushi Sako, Yasuo Ariki: Structuring of baseball live games based on speech recognition using task dependant knowledge.
In-Jeong Choi, Nam-Hoon Kim, Su Youn Yoon: Large vocabulary continuous speech recognition based on cross-morpheme phonetic information.
Changxue Ma: Automatic phonetic base form generation based on maximum context tree.
Gustavo Hernández Ábrego, Lex Olorenshaw, Raquel Tato, Thomas Schaaf: Dictionary refinements based on phonetic consensus and non-uniform pronunciation reduction.
Takahiro Shinozaki, Sadaoki Furui: Spontaneous speech recognition using a massively parallel decoder.
Tanja Schultz, Qin Jin, Kornel Laskowski, Yue Pan, Florian Metze, Christian Fügen: Issues in meeting transcription - the ISL meeting transcription system.
Katsutoshi Ohtsuki, Nobuaki Hiroshima, Shoichi Matsunaga, Yoshihiko Hayashi: Multi-pass ASR using vocabulary expansion.
Vlasios Doumpiotis, William Byrne: Pinched lattice minimum Bayes risk discriminative training for large vocabulary continuous speech recognition.
Izhak Shafran, William Byrne: Task-specific minimum Bayes-risk decoding using learned edit distance.
Rong Zhang, Alexander I. Rudnicky: Apply n-best list re-ranking to acoustic model combinations of boosting training.
Do Yeong Kim, S. Umesh, M. J. F. Gales, Thomas Hain, Philip C. Woodland: Using VTLN for broadcast news transcription.
Andreas Stolcke, Chuck Wooters, Ivan Bulyko, Martin Graciarena, Scott Otterson, Barbara Peskin, Mari Ostendorf, David Gelbart, Nikki Mirghafori, Tuomo W. Pirinen: From switchboard to meetings: development of the 2004 ICSI-SRI-UW meeting recognition system.
Anand Venkataraman, Andreas Stolcke, Wen Wang, Dimitra Vergyri, Jing Zheng, Venkata Ramana Rao Gadde: An efficient repair procedure for quick transcriptions.
Yao Qian, Tan Lee, Frank K. Soong: Tone information as a confidence measure for improving Cantonese LVCSR.
Speech Science
Danielle Due: Temporal variables in parkinsonian speech.
Olov Engwall: Speaker adaptation of a three-dimensional tongue model.
Hideki Kawahara, Hideki Banno, Toshio Irino, Jiang Jin: Intelligibility of degraded speech from smeared STRAIGHT spectrum.
Kajikawa Sachiyo, Fais Laurel, Shigeaki Amano, Werker Janet: Adult and infant sensitivity to phonotactic features in spoken Japanese.
Valter Ciocca, Tara L. Whitehill, Joan K.-Y. Ma: The effect of intonation on perception of Cantonese lexical tones.
Toshiko Isei-Jaakkola: Maximum short quantity in Japanese and finish in two perception tests with F0 and db variants.
Paavo Alku, Matti Airas, Brad H. Story: Evaluation of an inverse filtering technique using physical modeling of voice production.
Hui-ju Hsu, Janice Fon: Positional and phonotactic effects on the realization of taiwan Mandarin tone 2.
Karl Schnell, Arild Lacroix: Speech production based on lossy tube models: unit concatenation and sound transitions.
Qin Yan, Saeed Vaseghi, Dimitrios Rentzos, Ching-Hsiang Ho: Modelling and ranking of differences across formants of british, australian and american accents.
Tatsuya Kitamura, Satoru Fujita, Kiyoshi Honda, Hironori Nishimoto: An experimental method for measuring transfer functions of acoustic tubes.
Takuya Tsuji, Tokihiko Kaburagi, Kohei Wakamiya, Jiji Kim: Estimation of the vocal tract spectrum from articulatory movements using phoneme-dependent neural networks.
Kunitoshi Motoki, Hiroki Matsuzaki: Computation of the acoustic characteristics of vocal-tract models with geometrical perturbation.
Abdellah Kacha, Francis Grenez, Frédéric Bettens, Jean Schoentgen: Adaptive long-term predictive analysis of disordered speech.
Slobodan Jovicic, Sandra Antesevic, Zoran Saric: Phoneme restoration in degraded speech communication.
Maria Marinaki, Constantine Kotropoulos, Ioannis Pitas, Nikolaos Maglaveras: Automatic detection of vocal fold paralysis and edema.
Novel Features in ASR
Yasuhiro Minami, Erik McDermott, Atsushi Nakamura, Shigeru Katagiri: A theoretical analysis of speech recognition based on feature trajectory models.
Zhijian Ou, Zuoying Wang: Discriminative combination of multiple linear predictions for speech recognition.
Davood Gharavian, Seyed Mohammad Ahadi: Use of formants in stressed and unstressed continuous speech recognition.
Konstantin Markov, Satoshi Nakamura, Jianwu Dang: Integration of articulatory dynamic parameters in HMM/BN based speech recognition system.
Leigh David Alsteris, Kuldip K. Paliwal: ASR on speech reconstructed from short-time fourier phase spectra.
Spoken and Natural Language Understanding
Robert Lieb, Tibor Fábián, Günther Ruske, Matthias Thomae: Estimation of semantic confidences on lattice hierarchies.
Elizabeth Shriberg, Andreas Stolcke, Dustin Hillard, Mari Ostendorf, Barbara Peskin, Mary P. Harper, Yang Liu: The ICSI-SRI-UW metadata extraction system.
Mark Hasegawa-Johnson, Stephen E. Levinson, Tong Zhang: Automatic detection of contrast for speech understanding.
Nick Jui-Chang Wang, Jia-Lin Shen, Ching-Ho Tsai: Integrating layer concept inform ation into n-gram modeling for spoken language understanding.

Toshihiko Itoh, Atsuhiko Kai, Yukihiro Itoh, Tatsuhiro Konishi: An understanding strategy based on plausibility score in recognition history using CSR confidence measure.
Sangkeun Jung, Minwoo Jeong, Gary Geunbae Lee: Speech recognition error correction using maximum entropy language model.
Xiang Li, Juan M. Huerta: Discriminative training of compound-word based multinomial classifiers for speech routing.
Jihyun Eun, Changki Lee, Gary Geunbae Lee: An information extraction approach for spoken language understanding.
David Horowitz, Partha Lal, Pierce Gerard Buckley: A maximum entropy shallow functional parser for spoken language understanding.
Chung-Hsien Wu, Jui-Feng Yeh, Ming-Jun Chen: Speech act identification using an ontology-based partial pattern tree.
Ye-Yi Wang, Yun-Cheng Ju: Creating speech recognition grammars from regular expressions for alphanumeric concepts.
Tasuku Kitade, Tatsuya Kawahara, Hiroaki Nanjo: Automatic extraction of key sentences from oral presentations using statistical measure based on discourse markers.
Tomohiro Ohno, Shigeki Matsubara, Nobuo Kawaguchi, Yasuyoshi Inagaki: Robust dependency parsing of spontaneous Japanese speech and its evaluation.
Wolfgang Minker, Dirk Bühler, Christiane Beuschel: Strategies for optimizing a stochastic spoken natural language parser.
Tzu-Lun Lee, Ya-Fang He, Yun-Ju Huang, Shu-Chuan Tseng, Robert Eklund: Prolongation in spontaneous Mandarin.
Yuki Irie, Shigeki Matsubara, Nobuo Kawaguchi, Yukiko Yamaguchi, Yasuyoshi Inagaki: Speech intention understanding based on decision tree learning.
Satanjeev Banerjee, Alexander I. Rudnicky: Using simple speech-based features to detect the state of a meeting and the roles of the meeting participants.
Serdar Yildirim, Murtaza Bulut, Chul Min Lee, Abe Kazemzadeh, Zhigang Deng, Sungbok Lee, Shrikanth Narayanan, Carlos Busso: An acoustic study of emotions expressed in speech.
Tatsuya Kawahara, Ian Richard Lane, Tomoko Matsui, Satoshi Nakamura: Topic classification and verification modeling for out-of-domain utterance detection.
So-Young Park, Yong-Jae Kwak, Joon-Ho Lim, Hae-Chang Rim, Soo-Hong Kim: Partially lexicalized parsing model utilizing rich features.
Yoshimi Suzuki, Fumiyo Fukumoto, Yoshihiro Sekiguchi: Clustering similar nouns for selecting related news articles.
Leonardo Badino: Chinese text word-segmentation considering semantic links among sentences.
Speaker Segmentation and Clustering


Lori Lamel, Jean-Luc Gauvain, Leonardo Canseco-Rodriguez: Speaker diarization from speech transcripts.
Xavier Anguera Miró, Javier Hernando Pericas: Evolutive speaker segmentation using a repository system.
Hagai Aronowitz, David Burshtein, Amihood Amir: Speaker indexing in audio archives using test utterance Gaussian mixture modeling.
Antoine Raux: Automated lexical adaptation and speaker clustering based on pronunciation habits for non-native speech recognition.
Speech Processing in a Packet Network Environment
Kuldip K. Paliwal, Stephen So: Scalable distributed speech recognition using multi-frame GMM-based block quantization.
Naveen Srinivasamurthy, Kyu Jeong Han, Shrikanth Narayanan: Robust speech recognition over packet networks: an overview.

Peter Kabal, Colm Elliott: Synchronization of speaker selection for centralized tandem free voIP conferencing.
Akitoshi Kataoka, Yusuke Hiwasaki, Toru Morinaga, Jotaro Ikedo: Measuring the perceived importance of time- and frequency-divided speech blocks for transmitting over packet networks.
Moo Young Kim, W. Bastiaan Kleijn: Comparison of transmitter - based packet-loss recovery techniques for voice transmission.
Acoustic Modeling


Dominique Fohr, Odile Mella, Irina Illina, Christophe Cerisara: Experiments on the accuracy of phone models and liaison processing in a French broadcast news transcription system.
Jorge F. Silva, Shrikanth Narayanan: A statistical discrimination measure for hidden Markov models based on divergence.
Jan Stadermann, Gerhard Rigoll: A hybrid SVM/HMM acoustic modeling approach to automatic speech recognition.
Dirk Knoblauch: Data driven number-of-states selection in HMM topologies.
Youngkyu Cho, Sung-a Kim, Dongsuk Yook: Hybrid model using subspace distribution clustering hidden Markov models and semi-continuous hidden Markov models for embedded speech recognizers.
Peder A. Olsen, Karthik Visweswariah: Fast clustering of Gaussians and the virtue of representing Gaussians in exponential model format.
Karen Livescu, James R. Glass: Feature-based pronunciation modeling with trainable asynchrony probabilities.
Hong-Kwang Jeff Kuo, Yuqing Gao: Maximum entropy direct model as a unified model for acoustic modeling in speech recognition.
Arthur Chan, Mosur Ravishankar, Alexander I. Rudnicky, Jahanzeb Sherwani: Four-layer categorization scheme of fast GMM computation techniques in large vocabulary continuous speech recognition systems.
Takatoshi Jitsuhiro, Satoshi Nakamura: Increasing the mixture components of non-uniform HMM structures based on a variational Bayesian approach.
Panu Somervuo: Comparison of ML, MAP, and VB based acoustic models in large vocabulary speech recognition.
Wolfgang Macherey, Ralf Schlüter, Hermann Ney: Discriminative training with tied covariance matrices.
Gue Jun Jung, Su-Hyun Kim, Yung-Hwan Oh: An efficient codebook design in SDCHMM for mobile communication environments.
Makoto Shozakai, Goshu Nagino: Analysis of speaking styles by two-dimensional visualization of aggregate of acoustic models.
Myoung-Wan Koo, Ho-Hyun Jeon, Sang-Hong Lee: Context dependent phoneme duration modeling with tree-based state tying.
John Scott Bridle: Towards better understanding of the model implied by the use of dynamic features in HMMs.
Prosody Modeling and Generation
Jian-Feng Li, Guoping Hu, Ren-Hua Wang: Chinese prosody phrase break prediction based on maximum entropy model.
Yu Zheng, Gary Geunbae Lee, Byeongchang Kim: Using multiple linguistic features for Mandarin phrase break prediction in maximum-entropy classification framework.
David Escudero Mancebo, Valentín Cardeñoso-Payo: A proposal to quantitatively select the right intonation unit in data-driven intonation modeling.
Salma Mouline, Olivier Boëffard, Paul C. Bagshaw: Automatic adaptation of the momel F0 stylisation algorithm to new corpora.
Pablo Daniel Agüero, Klaus Wimmer, Antonio Bonafonte: Joint extraction and prediction of fujisaki's intonation model parameters.
Panagiotis Zervas, Nikos Fakotakis, George K. Kokkinakis, Georgios Kouroupetroglou, Gerasimos Xydas: Evaluation of corpus based tone prediction in mismatched environments for greek tts synthesis.


Volker Dellwo, Bianca Aschenberner, Petra Wagner, Jana Dancovicova, Ingmar Steiner: Bonntempo-corpus and bonntempo-tools: a database for the study of speech rhythm and rate.
Wentao Gu, Keikichi Hirose, Hiroya Fujisaki: Analysis of F0 contours of Cantonese utterances based on the command-response model.
Marion Dohen, Hélène Loevenbruck: Pre-focal rephrasing, focal enhancement and postfocal deaccentuation in French.
Sridhar Krishna Nemala, Partha Pratim Talukdar, Kalika Bali, A. G. Ramakrishnan: Duration modeling for hindi text-to-speech synthesis system.

Gerasimos Xydas, Georgios Kouroupetroglou: An intonation model for embedded devices based on natural F0 samples.
Multi-Sensor ASR
Martin Graciarena, Federico Cesari, Horacio Franco, Gregory K. Myers, Cregg Cowan, Victor Abrash: Combination of standard and throat microphones for robust speech recognition in highly noisy environments.
Cenk Demiroglu, David V. Anderson: Noise robust digit recognition using a glottal radar sensor for voicing detection.
Dominik Raub, John W. McDonough, Matthias Wölfel: A cepstral domain maximum likelihod beamformer for speech recognition.
Naoya Mochiki, Tetsunori Kobayashi, Toshiyuki Sekiya, Tetsuji Ogawa: Recognition of three simultaneous utterance of speech by four-line directivity microphone mounted on head of robot.
Shigeki Sagayama, Okajima Takashi, Kamamoto Yutaka, Takuya Nishimoto: Complex spectrum circle centroid for microphone-array-based noisy speech recognition.
Larry P. Heck, Mark Z. Mao: Automatic speech recognition of co-channel speech: integrated speaker and speech recognition approach.
Multi-Lingual Speech Processing
José B. Mariño, Asunción Moreno, Albino Nogueiras: A first experience on multilingual acoustic modeling of the languages spoken in morocco.
Mónica Caballero, Asunción Moreno, Albino Nogueiras: Data driven multidialectal phone set for Spanish dialects.
Harald Romsdorfer, Beat Pfister: Multi-context rules for phonological processing in polyglot TTS synthesis.
Leonardo Badino, Claudia Barolo, Silvia Quazza: A general approach to TTS reading of mixed-language texts.
Panayiotis G. Georgiou, Shrikanth S. Narayanan, Hooman Shirani Mehr: Context dependent statistical augmentation of persian transcripts.
Speech Enhancement
Cenk Demiroglu, David V. Anderson: A soft decision MMSE amplitude estimator as a noise preprocessor to speech coder s using a glottal sensor.
Rongqiang Hu, David V. Anderson: Single acoustic-channel speech enhancement based on glottal correlation using non-acoustic sensor.
Xianxian Zhang, John H. L. Hansen, Kathryn Hoberg Arehart, Jessica Rossi-Katz: In-vehicle based speech processing for hearing impaired subjects.
Tomohiro Nakatani, Keisuke Kinoshita, Masato Miyoshi, Parham Zolfaghari: Harmonicity based monaural speech dereverberation with time warping and F0 adaptive window.
Marc Delcroix, Takafumi Hikichi, Masato Miyoshi: Dereverberation of speech signals based on linear prediction.
Speech and Affect
Nick Campbell: Perception of affect in speech - towards an automatic processing of paralinguistic information in spoken conversation.
Noël Chateau, Valérie Maffiolo, Christophe Blouin: Analysis of emotional speech in voice mail messages: the influence of speakers' gender.
Chul Min Lee, Serdar Yildirim, Murtaza Bulut, Abe Kazemzadeh, Carlos Busso, Zhigang Deng, Sungbok Lee, Shrikanth Narayanan: Emotion recognition based on phoneme classes.

Ailbhe Ní Chasaide, Christer Gobl: Decomposing linguistic and affective components of phonatory quality.
Dan-Ning Jiang, Lian-Hong Cai: Classifying emotion in Chinese speech by decomposing prosodic features.
Takashi X. Fujisawa, Norman D. Cook: Identifying emotion in speech prosody using acoustical cues of harmony.
Jianhua Tao: Context based emotion detection from text input.
Atsushi Iwai, Yoshikazu Yano, Shigeru Okuma: Complex emotion recognition system for a specific user using SOM based on prosodic features.
Hoon-Young Cho, Kaisheng Yao, Te-Won Lee: Emotion verification for emotion detection and unknown emotion rejection.
Keikichi Hirose: Improvement in corpus-based generation of F0 contours using generation process model for emotional speech synthesis.
Speech Features
Rajesh Mahanand Hegde, Hema A. Murthy, Venkata Ramana Rao Gadde: Continuous speech recognition using joint features derived from the modified group delay function and MFCC.
Hua Yu: Phase-space representation of speech.
Hema A. Murthy, Rajesh Mahanand Hegde, Venkata Ramana Rao Gadde: The modified group delay feature: a new spectral representation of speech.

Barry Y. Chen, Qifeng Zhu, Nelson Morgan: Learning long-term temporal features in LVCSR using neural networks.
T. V. Sreenivas, G. V. Kiran, A. G. Krishna: Neural "spike rate spectrum" as a noise robust, speaker invariant feature for automatic speech recognition.
Yoshihisa Nakatoh, Makoto Nishizaki, Shinichi Yoshizawa, Maki Yamada: An adaptive MEL-LPC analysis for speech recognition.
Kentaro Ishizuka, Noboru Miyazaki, Tomohiro Nakatani, Yasuhiro Minami: Improvement in robustness of speech feature extraction method using sub-band based periodicity and aperiodicity decomposition.
Carlos Toshinori Ishi: A new acoustic measure for aspiration noise detection.
Kris Demuynck, Oscar Garcia, Dirk Van Compernolle: Synthesizing speech from speech recognition parameters.
Xiang Li, Richard M. Stern: Parallel feature generation based on maximizing normalized acoustic likelihood.
Kun-Ching Wang: An adaptive band-partitioning spectral entropy based speech detection in realistic noisy environments.
Javier Ramírez, José C. Segura, M. Carmen Benítez, Ángel de la Torre, Antonio J. Rubio: Improved voice activity detection combining noise reduction and subband divergence measures.
Kiyoung Park, Changkyu Choi, Jeongsu Kim: Voice activity detection using global soft decision with mixture of Gaussian model.
