ICSLP 1996:
Philadelphia,
PA,
USA
The 4th International Conference on Spoken Language Processing, Philadelphia, PA, USA, October 3-6, 1996.
ISCA 1996
Plenary Lectures
- Anne Cutler:
The comparative study of spoken-language processing.
- James L. Flanagan:
Natural communication with machines - progress and challenge.
Large Vocabulary
- Z. Li, Michel Héon, Douglas D. O'Shaughnessy:
New developments in the INRS continuous speech recognition system.
- Lori Lamel, Gilles Adda:
On designing pronunciation lexicons for large vocabulary, continuous speech recognition.
- Pablo Fetter, Frédéric Dandurand, Peter Regel-Brietzmann:
Word graph rescoring using confidence measures.
- Xavier L. Aubert, Peter Beyerlein, Meinhard Ullrich:
A bottom-up approach for handling unseen triphones in large vocabulary continuous speech recognition.
- V. Valtchev, Philip C. Woodland, Steve J. Young:
Discriminative optimisation of large vocabulary recognition systems.
- Tatsuo Matsuoka, Katsutoshi Ohtsuki, Takeshi Mori, Sadaoki Furui, Katsuhiko Shirai:
Japanese large-vocabulary continuous-speech recognition using a business-newspaper corpus.
- David M. Carter, Jaan Kaja, Leonardo Neumeyer, Manny Rayner, Fuliang Weng, Mats Wirén:
Handling compound nouns in a Swedish speech-understanding system.
- Javier Macías Guarasa, Ascensión Gallardo-Antolín, Javier Ferreiros, José Manuel Pardo, Luis Villarrubia Grande:
Initial evaluation of a preselection module for a flexible large vocabulary speech recognition system in.
Multimodal ASR (Face and Lips)
- Mamoun Alissali, Paul Deléglise, Alexandrina Rogozan:
Asynchronous integration of visual information in an automatic speech recognition system.
- Iain A. Matthews, J. Andrew Bangham, S. J. Cox:
Audiovisual speech recognition using multiscale nonlinear image decomposition.
- Qin Su, Peter L. Silsbee:
Robust audiovisual integration using semicontinuous hidden Markov models.
- Richard P. Schumeyer, Kenneth E. Barner:
The effect of visual information on word initial consonant perception of dysarthric speech.
- Devi Chandramohan, Peter L. Silsbee:
A multiple deformable template approach for visual speech recognition.
- Piero Cosi, Emanuela Magno Caldognetto, Franco Ferrero, M. Dugatto, Kyriaki Vagges:
Speaker independent bimodal phonetic recognition experiments.
- Juergen Luettin, Neil A. Thacker, Steve W. Beet:
Speechreading using shape and intensity information.
- Juergen Luettin, Neil A. Thacker, Steve W. Beet:
Speaker identification by lipreading.
Perception of Words
- David W. Gow Jr., Janis Melvold, Sharon Manuel:
How word onsets drive lexical access and segmentation: evidence from acoustics, phonology and processing.
- David van Kuijk, Peter Wittenburg, Ton Dijkstra:
RAW: a real-speech model for human word recognition.
- Mehdi Meftah, Sami Boudelaa:
How facilitatory can lexical information be during word recognition? evidence from moroccan arabic.
- Alette P. Haveman:
Effects of frequency on the auditory perception of open- versus closed-class words.
- Michael S. Vitevitch, Paul A. Luce, Jan Charles-Luce, David Kemmerer:
Phonotactic and metrical influences on adult ratings of spoken nonsense words.
- Edward T. Auer, Lynne E. Bernstein:
Lipreading supplemented by voice fundamental frequency: to what extent does the addition of voicing increase lexical uniqueness for the lipreader?
- Saskia te Riele, Sieb G. Nooteboom, Hugo Quené:
Strategies used in rhyme-monitoring.
- Wilma van Donselaar, Cecile T. L. Kuijpers, Anne Cutler:
How do dutch listeners process words with epenthetic schwa?
Phonetics,
Transcription,
and Analysis
- Patrick Juola, Philip Zimmermann:
Whole-word phonetic distances and the PGPfone alphabet.
- Shuping Ran, J. Bruce Millar, Phil Rose:
Automatic vowel quality description using a variable mapping to an eight cardinal vowel reference set.
- Andreas Kipp, Maria-Barbara Wesenick, Florian Schiel:
Automatic detection and segmentation of pronunciation variants in German speech corpora.
- Stephanie Seneff, Raymond Lau, Helen M. Meng:
ANGIE: a new framework for speech analysis based on morpho-phonological modelling.
- Byunggon Yang:
Perceptual contrast in the Korean and English vowel system normalized.
- Yong-Ju Lee, Sook-Hyang Lee:
On phonetic characteristics of pause in the Korean read speech.
- Sami Boudelaa, Mehdi Meftah:
Cross-language effects of lexical stress in word recognition: the case of Arabic English bilinguals.
- Maria-Barbara Wesenick:
Automatic generation of German pronunciation variants.
- Maria-Barbara Wesenick, Andreas Kipp:
Estimating the quality of phonetic transcriptions and segmentations of speech signals.
- Bojan Petek, Rastislav Sustarsic, Smiljana Komar:
An acoustic analysis of contemporary vowels of the standard slovenian language.
- Sandrine Robbe, Anne Bonneau, Sylvie Coste, Yves Laprie:
Using decision trees to construct optimal acoustic cues.
- Donna Erickson, Osamu Fujimura:
Maximum jaw displacement in contrastive emphasis.
- Rebecca Herman, Mary E. Beckman, Kiyoshi Honda:
Subglottal pressure and final lowering in English.
- Cecile T. L. Kuijpers, Wilma van Donselaar, Anne Cutler:
Phonological variation: epenthesis and deletion of schwa in Dutch.
Spoken Language Processing for Special Populations
- James J. Mahshie:
Feedback considerations for speech training systems.
- Anne-Marie Öster:
Clinical applications of computer-based speech training for children with hearing impairment.
- Valérie Hazan, Andrew Simpson:
Enhancing information-rich regions of natural VCV and sentence materials presented in noise.
- Valérie Hazan, Alan Adlard:
Speech perceptual abilities of children with specific reading difficulty (dyslexia).
- Larry D. Paarmann, Michael K. Wynne:
Bimodal perception of spectrum compressed speech.
- Dragana Barac-Cikoja, Sally Revoile:
Effect of sentential context on syllabic stress perception by hearing-impaired listeners.
- Martin Russell, Catherine Brown, Adrian Skilling, Robert W. Series, Julie L. Wallace, Bill Bohnam, Paul Barker:
Applications of automatic speech recognition to speech and language development in young children.
- D. R. Campbell:
Sub-band adaptive speech enhancement for hearing aids.
- Thomas Portele, Jürgen Krämer:
Adapting a TTS system to a reading machine for the blind.
Dialogue Special Sessions
- Katsuhiko Shirai:
Modeling of spoken dialogue with and without visual information.
- Stephanie Seneff, David Goddeau, Christine Pao, Joseph Polifroni:
Multimodal discourse modelling in a multi-user multi-domain environment.
- Kenji Kita, Yoshikazu Fukui, Masaaki Nagata, Tsuyoshi Morimoto:
Automatic acquisition of probabilistic dialogue models.
- Paul Heisterkamp, Scott McGlashan:
Units of dialogue management: an example.
- Sharon L. Oviatt, Robert VanGent:
Error resolution during multimodal human-computer interaction.
- Ramesh R. Sarukkai, Dana H. Ballard:
Improved spontaneous dialogue recognition using dialogue and utterance triggers by adaptive probability boosting.
- Kai Hbener, Uwe Jost, Henrik Heine:
Speech recognition for spontaneously spoken German dialogues.
- Paul Taylor, Hiroshi Shimodaira, Stephen Isard, Simon King, Jacqueline C. Kowtko:
Using prosodic information to constrain language models for spoken dialogue.
- Peter A. Heeman, Kyung-ho Loken-Kim, James F. Allen:
Combining the detection and correction of speech repairs.
- Yuji Sagawa, Wataru Sugimoto, Noboru Ohnishi:
Generating spontaneous elliptical utterance.
- Gösta Bruce, Marcus Filipsson, Johan Frid, Björn Granström, Kjell Gustafson, Merle Horne, David House, Birgitta Lastow, Paul Touati:
Developing the modelling of Swedish prosody in spontaneous dialogue.
- Shimei Pan, Kathleen McKeown:
Spoken language generation in a multimedia system.
- Keikichi Hirose, Mayumi Sakata, Hiromichi Kawanami:
Synthesizing dialogue speech of Japanese based on the quantitative analysis of prosodic features.
- Shuichi Tanaka, Shu Nakazato, Keiichiro Hoashi, Katsuhiko Shirai:
Spoken dialogue interface in a dual task situation.
- Yasuhisa Niimi, Yutaka Kobayashi:
A dialogue control strategy based on the reliability of speech recognition.
- Alexander I. Rudnicky, Stephen Reed, Eric H. Thayer:
Speechwear: a mobile speech system.
- Helen M. Meng, Senis Busayapongchai, James R. Glass, David Goddeau, I. Lee Hetherington, Edward Hurley, Christine Pao, Joseph Polifroni, Stephanie Seneff, Victor Zue:
WHEELS: a conversational system in the automobile classifieds domain.
- M. David Sadek, A. Ferrieux, A. Cozannet, Philippe Bretier, Franck Panaget, J. Simonin:
Effective human-computer cooperative spoken dialogue: the AGS demonstrator.
- Samir Bennacef, Laurence Devillers, Sophie Rosset, Lori Lamel:
Dialog in the RAILTEL telephone-based system.
- Alon Lavie, Lori S. Levin, Yan Qu, Alex Waibel, Donna Gates, Marsal Gavaldà, Laura Mayfield, Maite Taboada:
Dialogue processing in a conversational speech translation system.
Language Modeling
- Thomas Niesler, Philip C. Woodland:
Combination of word-based and category-based language models.
- Francisco J. Valverde-Albacete, José Manuel Pardo:
A multi-level lexical-semantics based language model design for guided integrated continuous speech recognition.
- Florian Gallwitz, Elmar Nöth, Heinrich Niemann:
A category based approach for recognition of out-of-vocabulary words.
- Kristie Seymore, Ronald Rosenfeld:
Scalable backoff language models.
- Rukmini Iyer, Mari Ostendorf:
Modeling long distance dependence in language: topic mixtures vs. dynamic cache models.
- Marcello Federico:
Bayesian estimation methods for n-gram language model adaptation.
- Man-Hung Siu, Mari Ostendorf:
Modeling disfluencies in conversational speech.
- John Miller, Fil Alleva:
Evaluation of a language model using a clustered model backoff.
- Antonio Bonafonte, José B. Mariño:
Language modeling using x-grams.
- Klaus Ries, Finn Dag Buø, Alex Waibel:
Class phrase models for language modelling.
- Petra Geutner:
Introducing linguistic constraints into statistical language modeling.
- Jianying Hu, William Turin, Michael K. Brown:
Language modeling with stochastic automata.
Feature Extraction for Speech Recognition
- Don X. Sun:
Feature dimension reduction using reduced-rank maximum likelihood estimation for hidden Markov models.
- Kai Hbener:
Using multi-level segmentation coefficients to improve HMM speech recognition.
- Thomas Eisele, Reinhold Haeb-Umbach, Detlev Langmann:
A comparative study of linear feature transformation techniques for automatic speech recognition.
- Ben Milner:
Inclusion of temporal information into features for speech recognition.
- Hubert Wassner, Gérard Chollet:
New cepstral representation using wavelet analysis and spectral transformation for robust speech recognition.
- Christopher John Long, Sekharajit Datta:
Wavelet based feature extraction for phoneme recognition.
- Andrzej Drygajlo:
New fast wavelet packet transform algorithms for frame synchronized speech processing.
- Srinivasan Umesh, Leon Cohen, Nenad Marinovic, Douglas J. Nelson:
Frequency-warping in speech.
- Daisuke Kobayashi, Shoji Kajita, Kazuya Takeda, Fumitada Itakura:
Extracting speech features from human speech-like noise.
- Shoji Kajita, Kazuya Takeda, Fumitada Itakura:
Subband-crosscorrelation analysis for robust speech recognition.
- Hervé Bourlard, Stéphane Dupont:
A new ASR approach based on independent processing and recombination of partial frequency bands.
- Climent Nadeu, José B. Mariño, Javier Hernando, Albino Nogueiras:
Frequency and time filtering of filter-bank energies for HMM speech recognition.
Speech Production - Measurement and Modeling
- Yves Laprie, Marie-Odile Berger:
Extraction of tongue contours in x-ray images with minimal user interaction.
- Didier Demolin, Thierry Metens, Alain Soquet:
Three-dimensional measurement of the vocal tract by MRI.
- Philip Gleason, Betty Tuller, J. A. Scott Kelso:
Syllable affiliation of final consonant clusters undergoes a phase transition over speaking rates.
- Arthur Lobo, Michael H. O'Malley:
Towards a biomechanical model of the larynx.
- Yann Morlec, Gérard Bailly, Véronique Aubergé:
Generating intonation by superposing gestures.
- Hideki Kawahara, Hiroko Kato, J. C. Williams:
Effects of auditory feedback on F0 trajectory generation.
Speech Coding / HMMs and NNs in ASR
- Ian S. Burnett, John J. Parry:
On the effects of accent and language on low rate speech coders.
- Jeng-Shyang Pan, Fergus R. McInnes, Mervyn A. Jack:
VQ codevector index assignment using genetic algorithms for noisy channels.
- Gavin C. Cawley:
An improved vector quantization algorithm for speech transmission over noisy channels.
- C. Murgia, Gang Feng, Alain Le Guyader, Catherine Quinquis:
Very low delay and high quality coding of 20 hz-15 khz speech signals at 64 kbit/s.
- Carlos M. Ribeiro, Isabel Trancoso:
Application of speaker modification techniques to phonetic vocoding.
- Tadashi Yonezaki, Kiyohiro Shikano:
Entropy coded vector quantization with hidden Markov models.
- Minoru Kohata:
An application of recurrent neural networks to low bit rate speech coding.
- Kazuhito Koishida, Keiichi Tokuda, Takao Kobayashi, Satoshi Imai:
CELP coding system based on mel-generalized cepstral analysis.
- Cheung-Fat Chan, Wai-Kwong Hui:
Wideband re-synthesis of narrowband CELP-coded speech using multiband excitation model.
- Takuya Koizumi, Mikio Mori, Shuji Taniguchi, Mitsutoshi Maruya:
Recurrent neural networks for phoneme recognition.
- M. A. Mokhtar, A. Zein-el-Abddin:
A model for the acoustic phonetic structure of arabic language using a single ergodic hidden Markov model.
- Yifan Gong, Irina Illina, Jean Paul Haton:
Modelling long term variability information in mixture stochastic trajectory framework.
- Thierry Moudenc, Robert Sokol, Guy Mercier:
Segmental phonetic features recognition by means of neural-fuzzy networks and integration in an n-best solutions post-processing.
- Irina Illina, Yifan Gong:
Stochastic trajectory model with state-mixture for continuous speech recognition.
- Hermann Hild, Alex Waibel:
Recognition of spelled names over the telephone.
- Gilles Boulianne, Patrick Kenny:
Optimal tying of HMM mixture densities using decision trees.
- Hwan Jin Choi, Yung-Hwan Oh:
Speech recognition using an enhanced FVQ based on a codeword dependent distribution normalization and codeword weighting by fuzzy objective function.
- Mikko Kurimo, Panu Somervuo:
Using the self-organizing map to speed up the probability density estimation for speech recognition with mixture density HMMs.
Vowels
NNs and Stochastic Modeling
- Geunbae Lee, Jong-Hyeok Lee, Kyubong Park, Byung-Chang Kim:
Integrating connectionist, statistical and symbolic approaches for continuous spoken Korean processing.
- Hynek Hermansky, Sangita Timberwala, Misha Pavel:
Towards ASR on partially corrupted speech.
- Herbert Gish, Kenney Ng:
Parametric trajectory models for speech recognition.
- Kate Knill, M. J. F. Gales, Steve J. Young:
Use of Gaussian selection in large vocabulary continuous speech recognition using HMMs.
- J. Hogberg, Kåre Sjölander:
Cross phone state clustering using lexical stress and context.
- Eduardo Lleida-Solano, Richard C. Rose:
Likelihood ratio decoding and confidence measures for continuous speech recognition.
- Xiaohui Ma, Yifan Gong, Yuqing Fu, Jiren Lu, Jean Paul Haton:
A study on continuous Chinese speech recognition based on stochastic trajectory models.
- Yoshiaki Itoh, Jiro Kiyama, Hiroshi Kojima, Susumu Seki, Ryuichi Oka:
A proposal for a new algorithm of reference interval-free continuous DP for real-time speech or text retrieval.
- Akinori Ito, Masaki Kohda:
Language modeling by string pattern n-gram for Japanese speech recognition.
- Reinhard Kneser:
Statistical language modeling using a variable context length.
- Finn Tore Johansen:
A comparison of hybrid HMM architectures using global discriminative training.
- Wei Wei, Etienne Barnard, Mark A. Fanty:
Improved probability estimation with neural network models.
- Ha-Jin Yu, Yung-Hwan Oh:
A neural network using acoustic sub-word units for continuous speech recognition.
- Louis ten Bosch, Roel Smits:
On the error criteria in neural networks as a tool for human classification modelling.
- Gordon Ramsay:
A non-linear filtering approach to stochastic training of the articulatory-acoustic mapping using the EM algorithm.
- Y. P. Yang, John R. Deller Jr.:
A tool for automated design of language models.
- Felix Freitag, Enric Monte:
Acoustic-phonetic decoding based on elman predictive neural networks.
- Tan Lee, P. C. Ching:
On improving discrimination capability of an RNN based recognizer.
- Yumi Wakita, Jun Kawai, Hitoshi Iida:
An evaluation of statistical language modeling for speech recognition using a mixed category of both words and parts-of-speech.
Neural Models of Speech Processing
- Boris Aleksandrovsky, James Whitson, Gretchen Andes, Gary Lynch, Richard Granger:
Novel speech processing mechanism derived from auditory neocortical circuit analysis.
- Ping Tang, Jean Rouat:
Modeling neurons in the anteroventral cochlear nucleus for amplitude modulation (AM) processing: application to speech sound.
- Halewijn Vereecken, Jean-Pierre Martens:
Noise suppression and loudness normalization in an auditory model-based acoustic front-end.
- James J. Hant, Brian Strope, Abeer Alwan:
A psychoacoustic model for the noise masking of voiceless plosive bursts.
- Martin Hunke, Thomas Holton:
Training machine classifiers to match the performance of human listeners in a natural vowel classification task.
- Kiyoaki Aikawa, Hideki Kawahara, Minoru Tsuzaki:
A neural matrix model for active tracking of frequency-modulated tones.
Utterance Verification and Word Spotting
- Richard C. Rose, Eduardo Lleida-Solano, G. W. Erhart, R. V. Grubbe:
A user-configurable system for voice label recognition.
- Philippe Gelin, Christian Wellekens:
Keyword spotting enhancement for video soundtrack indexing.
- Rachida El Méliani, Douglas D. O'Shaughnessy:
New efficient fillers for unlimited word recognition and keyword spotting.
- Michelle S. Spina, Victor Zue:
Automatic transcription of general audio data: preliminary analyses.
- Francis Kubala, Tasos Anastasakos, Hubert Jin, Long Nguyen, Richard M. Schwartz:
Transcribing radio news.
- Anand R. Setlur, Rafid A. Sukkar, John Jacob:
Correcting recognition errors via discriminative utterance verification.
Acquisition/Learning Training L2 Learners
Focus,
Stress and Accent
Spoken Language Dialogue and Conversation
- Norbert Reithinger, Ralf Engel, Michael Kipp, Martin Klesen:
Predicting dialogue acts for a speech-to-speech translation system.
- Johannes Müller, Holger Stahl, Manfred Lang:
Automatic speech translation based on the semantic structure.
- Lewis M. Norton, Carl Weir, K. W. Scholz, Deborah A. Dahl, Ahmed Bouzid:
A methodology for application development for spoken language systems.
- Stephanie Seneff, Joseph Polifroni:
A new restaurant guide conversational system: issues in rapid prototyping for specialized domains.
- Tadahiko Kumamoto, Akira Ito:
Semantic interpretation of a Japanese complex sentence in an advisory dialogue - focused on the postpositional word "KEDO, " which works as a conjunction between clauses.
- Youngkuk Hong, Myoung-Wan Koo, Gijoo Yang:
A Korean morphological analyzer for speech translation system.
- Rolf Carlson, Sheri Hunnicutt:
Generic and domain-specific aspects of the waxholm NLP and dialog modules.
- Megumi Kameyama, Goh Kawai, Isao Arima:
A real-time system for summarizing human-human spontaneous spoken dialogues.
- Bernd Hildebrandt, Heike Rautenstrauch, Gerhard Sagerer:
Evaluation of spoken language understanding and dialogue systems.
- Kuniko Kakita:
Inter-speaker interaction of F0 in dialogs.
- Hans Brandt-Pook, Gernot A. Fink, Bernd Hildebrandt, Franz Kummert, Gerhard Sagerer:
A robust dialogue system for making an appointment.
- Kazuyuki Takagi, Shuichi Itahashi:
Segmentation of spoken dialogue by interjections, disfluent utterances and pauses.
- David Goddeau, Helen M. Meng, Joseph Polifroni, Stephanie Seneff, Senis Busayapongchai:
A form-based dialogue manager for spoken language applications.
- Steve Whittaker, David Attwater:
The design of complex telephony applications using large vocabulary speech technology.
- Stephen Sutton, David G. Novick, Ronald A. Cole, Pieter J. E. Vermeulen, Jacques de Villiers, Johan Schalkwyk, Mark A. Fanty:
Building 10, 000 spoken dialogue systems.
- Yen-Ju Yang, Lee-Feng Chien, Lin-Shan Lee:
Speaker intention modeling for large vocabulary Mandarin spoken dialogues.
- P. E. Kenne, Mary O'Kane:
Hybrid language models and spontaneous legal discourse.
- P. E. Kenne, Mary O'Kane:
Topic change and local perplexity in spoken legal dialogue.
- Jennifer J. Venditti, Marc Swerts:
Intonational cues to discourse structure in Japanese.
- Niels Ole Bernsen, Hans Dybkjær, Laila Dybkjær:
Principles for the design of cooperative spoken human-machine dialogue.
- Karen L. Jenkin, Michael S. Scordilis:
Development and comparison of three syllable stress classifiers.
Speech Disorders
- Donald G. Jamieson, Li Deng, M. Price, Vijay Parsa, J. Till:
Interaction of speech disorders with speech coders: effects on speech intelligibility.
- Maurílio Nunes Vieira, Arnold G. D. Maran, Fergus R. McInnes, Mervyn A. Jack:
Detecting arytenoid cartilage misplacement through acoustic and electroglottographic jitter analysis.
- Maurílio Nunes Vieira, Fergus R. McInnes, Mervyn A. Jack:
Robust F0 and jitter estimation in pathological voices.
- F. Plante, H. Kessler, Barry M. G. Cheetham, J. Earis:
Speech monitoring of infective laryngitis.
- Jean Schoentgen, Raoul De Guchteneere:
Searching for nonlinear relations in whitened jitter time series.
- Liliana Gavidia-Ceballos, John H. L. Hansen, James F. Kaiser:
Vocal fold pathology assessment using AM autocorrelation analysis of the teager energy operator.
- David P. Kuehn:
Continuous positive airway pressure (CPAP) in the treatment of hypernasality.
- Carol Y. Espy-Wilson, Venkatesh R. Chari, Caroline B. Huang:
Enhancement of alaryngeal speech by adaptive filtering.
- Li Deng, Xuemin Shen, Donald G. Jamieson, J. Till:
Simulation of disordered speech using a frequency-domain vocal tract model.
- Yasuo Endo, Hideki Kasuya:
A stochastic model of fundamental period perturbation and its application to perception of pathological voice quality.
- Eric J. Wallen, John H. L. Hansen:
A screening test for speech pathology assessment using objective quality measures.
- Douglas A. Cairns, John H. L. Hansen, James F. Kaiser:
Recent advances in hypernasal speech detection using the nonlinear teager energy operator.
Vocal Tract Geometry
- Kiyoshi Honda, Shinji Maeda, Michiko Hashi, Jim Dembowski, John R. Westbury:
Human palate and related structures: their articulatory consequences.
- Edward P. Davis, Andrew Douglas, Maureen C. Stone:
A continuum mechanics representation of tongue deformation.
- Philbert Bangayan, Abeer Alwan, Shrikanth Narayanan:
From MRI and acoustic data to articulatory synthesis: a case study of the lateral approximants in american English.
- Shrikanth Narayanan, Abigail Kaun, Dani Byrd, Peter Ladefoged, Abeer Alwan:
Liquids in tamil.
- Chang-Sheng Yang, Hideki Kasuya:
Speaker individualities of vocal tract shapes of Japanese vowels measured by magnetic resonance images.
- S. El-Masri, Xavier Pelorson, P. Saguet, Pierre Badin:
Vocal tract acoustics using the transmission line matrix (TLM) method.
- Gérard Bailly:
Building sensori-motor prototypes from audiovisual exemplars.
- Mats Båvegård, Gunnar Fant:
Parameterized VT area function inversion.
- Jianwu Dang, Kiyoshi Honda:
An improved vocal tract model of vowel production implementing piriform resonance and transvelar nasal coupling.
- C. S. Blackburn, Steve J. Young:
Pseudo-articulatory speech synthesis for recognition using automatic feature extraction from x-ray data.
Prosody in ASR and Segmentation
- Sharon L. Oviatt, Gina-Anne Levow, Margaret MacEachern, Karen Kuhn:
Modeling hyperarticulate speech during human-computer error resolution.
- Siripong Potisuk, Mary P. Harper, Jackson T. Gandour:
Using stress to disambiguate spoken Thai sentences containing syntactic ambiguity.
- Hung-yun Hsieh, Ren-Yuan Lyu, Lin-Shan Lee:
Use of prosodic information to integrate acoustic and linguistic knowledge in continuous Mandarin speech recognition with very large vocabulary.
- G. V. Ramana Rao, J. Srichand:
Word boundary detection using pitch variations.
- Atsuhiro Sakurai, Keikichi Hirose:
Detection of phrase boundaries in Japanese by low-pass filtering of fundamental frequency contours.
- Vincent Pagel, Noelle Carbonell, Yves Laprie:
A new method for speech delexicalization, and its application to the perception of French prosody.
Acquisition and Learning by Machine
Dialogue Systems
- Jean-Luc Gauvain, J. J. Gangolf, Lori Lamel:
Speech recognition for an information kiosk.
- Helmer Strik, Albert Russel, Henk van den Heuvel, Catia Cucchiarini, Lou Boves:
Localizing an automatic inquiry system for public transport information.
- Stephen M. Marcus, Deborah W. Brown, Randy G. Goldberg, Max S. Schoeffler, William R. Wetzel, Richard R. Rosinski:
Prompt constrained natural language - evolving the next generation of telephony services.
- Tatsuya Kawahara, Chin-Hui Lee, Biing-Hwang Juang:
Key-phrase detection and verification for flexible speech understanding.
- Bernhard Suhm, Brad A. Myers, Alex Waibel:
Interactive recovery from speech recognition errors in speech user interfaces.
- Sunil Issar:
Estimation of language models for new spoken language applications.
Speech Enhancement and Robust Processing
- Xuemin Shen, Li Deng, Anisa Yasmin:
H-infinity filtering for speech enhancement.
- Saeed Vaseghi, Ben P. Milner:
A comparitive analysis of channel-robust features and channel equalization methods for speech recognition.
- Jia-Lin Shen, Wen-Liang Hwang, Lin-Shan Lee:
Robust speech recognition features based on temporal trajectory filtering of frequency band spectrum.
- Kevin Power:
Durational modelling for improved connected digit recognition.
- Carlos Avendaño, Hynek Hermansky:
Study on the dereverberation of speech based on temporal envelope filtering.
- Thorsten Brants:
Estimating Markov model structures.
- Eric K. Ringger, James F. Allen:
A fertility channel model for post-correction of continuous speech recognition.
- Hiroshi Yasukawa:
Restoration of wide band signal from telephone speech using linear prediction error processing.
- Hiroshi Matsumoto, Noboru Naitoh:
Smoothed spectral subtraction for a frequency-weighted HMM in noisy speech recognition.
- William S. Woods, Martin Hansen, Thomas Wittkop, Birger Kollmeier:
A simple architecture for using multiple cues in sound separation.
- Bojan Petek, Ove Andersen, Paul Dalsgaard:
On the robust automatic segmentation of spontaneous speech.
- C. G. Miglietta, Chafic Mokbel, Denis Jouvet, Jean Monné:
Bayesian adaptation of speech recognizers to field speech data.
- A. J. Darlington, D. J. Campbell:
Sub-band adaptive filtering applied to speech enhancement.
- J. P. Openshaw, John S. Mason:
Noise robust estimate of speech dynamics for speaker recognition.
- Javier Ortega-Garcia, Joaquin Gonzalez-Rodriguez:
Overview of speech enhancement techniques for automatic speaker recognition.
- Naomi Harte, Saeed Vaseghi, Ben P. Milner:
Dynamic features for segmental speech recognition.
- Takuya Koizumi, Mikio Mori, Shuji Taniguchi:
Speech recognition based on a model of human auditory system.
- Josep M. Salavedra, Enrique Masgrau:
APVQ encoder applied to wideband speech coding.
- Jin Zhou, Yair Shoham, Ali N. Akansu:
Simple fast vector quantization of the line spectral frequencies.
Speaker Adaptation and Normalization I
- Tomoko Matsui, Sadaoki Furui:
N-best-based instantaneous speaker adaptation method for speech recognition.
- Claude Montacié, Marie-José Caraty, Claude Barras:
Mixture splitting technic and temporal control in a HMM-based recognition system.
- Lei Yao, Dong Yu, Taiyi Huang:
A unified spectral transformation adaptation approach for robust speech recognition.
- Qiang Huo, Chin-Hui Lee:
On-line adaptive learning of the correlated continuous density hidden Markov models for speech recognition.
- Nikko Ström:
Speaker adaptation by modeling the speaker variation in a continuous speech recognition system.
- Yasuo Ariki, Shigeaki Tagashira:
An enquiring system of unknown words in TV news by spontaneous repetition (application of speaker normalization by speaker subspace projection).
- Jin-Song Zhang, Beiqian Dai, Changfu Wang, HingKeung Kwan, Keikichi Hirose:
Adaptive recognition method based on posterior use of distribution pattern of output probabilities.
- Philip C. Woodland, D. Pye, M. J. F. Gales:
Iterative unsupervised adaptation using maximum likelihood linear regression.
- Tasos Anastasakos, John W. McDonough, Richard M. Schwartz, John Makhoul:
A compact model for speaker-adaptive training.
- Shigeru Homma, Jun-ichi Takahashi, Shigeki Sagayama:
Iterative unsupervised speaker adaptation for batch dictation.
- Daniel C. Burnett, Mark A. Fanty:
Rapid unsupervised adaptation to children's speech on a connected-digit task.
- Jun Ishii, Masahiro Tonomura, Shoichi Matsunaga:
Speaker adaptation using tree structured shared-state HMMs.
Spoken Language and NLP
- Richard M. Schwartz, Scott Miller, David Stallard, John Makhoul:
Language understanding using hidden understanding models.
- Allen L. Gorin:
Processing of semantic information in fluently spoken language.
- Andreas Stolcke, Elizabeth Shriberg:
Automatic linguistic segmentation of conversational speech.
- Manuela Boros, Wieland Eckert, Florian Gallwitz, Günther Görz, Gerhard Hanrieder, Heinrich Niemann:
Towards understanding spontaneous speech: word accuracy vs. concept accuracy.
- Wolfgang Minker, Samir Bennacef, Jean-Luc Gauvain:
A stochastic case frame approach for natural language understanding.
- Frank Seide, Bernhard Rueber, Andreas Kellner:
Improving speech understanding by incorporating database constraints and dialogue history.
- Finn Dag Buø, Alex Waibel:
Learning to parse spontaneous speech.
- Jean-Yves Antoine:
Spontaneous speech and natural language processing ALPES: a robust semantic-led parser.
- J. Alvarez-Cercadillo, F. Javier Caminero-Gil, C. Crespo-Casas, Daniel Tapias Merino:
The natural language processing module for a voice assisted operator at telef nica i+D.
- André Berton, Pablo Fetter, Peter Regel-Brietzmann:
Compound words in large-vocabulary German speech recognition systems.
- Anton Batliner, Anke Feldhaus, Stefan Geißler, Tibor Kiss, Ralf Kompe, Elmar Nöth:
Prosody, empty categories and parsing - a success story.
- B. Srinivas:
"almost parsing" technique for language modeling.
Spoken Discourse Analysis/Synthesis
Acoustic Modeling
- Christian-Michael Westendorf, Jens Jelitto:
Learning pronunciation dictionary from speech data.
- C. Rathinavelu, Li Deng:
The trended HMM with discriminative training for phonetic classification.
- Ariane Lazaridès, Yves Normandin, Roland Kuhn:
Improving decision trees for acoustic modeling.
- Gongjun Li, Taiyi Huang:
An improved training algorithm in HMM-based speech recognition.
- Ji Ming, Peter O'Boyle, John G. McMahon, F. Jack Smith:
Speech recognition using a strong correlation assumption for the instantaneous spectra.
- Pau Pachès-Leal, Climent Nadeu:
On parameter filtering in continuous subword-unit-based speech recognition.
- Shigeki Okawa, Katsuhiko Shirai:
Estimation of statistical phoneme center considering phonemic environments.
- Xue Wang, Louis ten Bosch, Louis C. W. Pols:
Integration of context-dependent durational knowledge into HMM-based speech recognition.
- Toshiaki Fukada, Michiel Bacchiani, Kuldip K. Paliwal, Yoshinori Sagisaka:
Speech recognition based on acoustically derived segment units.
- Rivarol Vergin, Azarshid Farhat, Douglas D. O'Shaughnessy:
Robust gender-dependent acoustic-phonetic modelling in continuous speech recognition based on a new automatic male/female classification.
- Tae-Young Yang, Won-Ho Shin, Weon-Goo Kim, Dae Hee Youn:
A codebook adaptation algorithm for SCHMM using formant distribution.
- Jacques Simonin, S. Bodin, Denis Jouvet, Katarina Bartkova:
Parameter tying for flexible speech recognition.
- Tsuneo Nitta, Shin'ichi Tanaka, Yasuyuki Masai, Hiroshi Matsuura:
Word-spotting based on inter-word and intra-word diphone models.
- Antonio Bonafonte, Josep Vidal, Albino Nogueiras:
Duration modeling with expanded HMM applied to speech recognition.
- Ricardo de Córdoba, José Manuel Pardo:
Different strategies for distribution clustering using discrete, semicontinuous and continuous HMMs in CSR.
- Ilija Zeljkovic, Shrikanth Narayanan:
Improved HMM phone and triphone models for realtime ASR telephony applications.
- Yasuhiro Minami, Sadaoki Furui:
Improved extended HMM composition by incorporating power variance.
- Gordon Ramsay, Li Deng:
Optimal filtering and smoothing for speech recognition using a stochastic target model.
- Zhihong Hu, Johan Schalkwyk, Etienne Barnard, Ronald A. Cole:
Speech recognition using syllable-like units.
- Jean-Claude Junqua, Lorenzo Vassallo:
Context modeling and clustering in continuous speech recognition.
- Li Deng, Jim Jian-Xiong Wu:
Hierarchical partition of the articulatory state space for overlapping-feature based speech recognition.
- Olivier Oppizzi, David Fournier, Philippe Gilles, Henri Meloni:
A fuzzy acoustic-phonetic decoder for speech recognition.
- Katrin Kirchhoff:
Syllable-level desynchronisation of phonetic features for speech recognition.
- James R. Glass, Jane W. Chang, Michael K. McCandless:
A probabilistic framework for feature-based speech recognition.
- Jim Jian-Xiong Wu, Li Deng, Jacky Chan:
Modeling context-dependent phonetic units in a continuous speech recognition system for Mandarin Chinese.
Physics and Simulation of the Vocal Tract
Duration and Rhythm
Acoustic Analysis
- Goangshiuan S. Ying, Leah H. Jamieson, Carl D. Mitchell:
A probabilistic approach to AMDF pitch detection.
- Alain Soquet, Véronique Lecuit, Thierry Metens, Didier Demolin:
From sagittal cut to area function: an RMI investigation.
- Léonard Janer, Juan José Bonet, Eduardo Lleida-Solano:
Pitch detection and voiced/unvoiced decision algorithm based on wavelet transforms.
- Yannis Stylianou:
Decomposition of speech signals into a deterministic and a stochastic part.
- Cheol-Woo Jo, Ho-Gyun Bang, William A. Ainsworth:
Improved glottal closure instant detector based on linear prediction and standard pitch concept.
- Xihong Wang, Stephen A. Zahorian, Stefan Auberg:
Analysis of speech segments using variable spectral/temporal resolution.
- Brian Eberman, William Goldenthal:
Time-based clustering for phonetic segmentation.
- Parham Zolfaghari, Tony Robinson:
Formant analysis using mixtures of Gaussians.
- Hywel B. Richards, John S. Mason, Melvyn J. Hunt, John S. Bridle:
Deriving articulatory representations from speech with various excitation modes.
- Manish Sharma, Richard J. Mammone:
"blind" speech segmentation: automatic segmentation of speech without linguistic knowledge.
- Hiroshi Ohmura, Kazuyo Tanaka:
Speech synthesis using a nonlinear energy damping model for the vocal folds vibration effect.
- Munehiro Namba, Hiroyuki Kamata, Yoshihisa Ishida:
Neural networks learning with L1 criteria and its efficiency in linear prediction of speech signals.
- Anna Esposito, Eugène C. Ezin, M. Ceccarelli:
Preprocessing and neural classification of English stop consonants [b, d, g, p, t, k].
- K. S. Ananthakrishnan:
A comparison of modified k-means(MKM) and NN based real time adaptive clustering algorithms for articulatory space codebook formation.
- Wen Ding, Hideki Kasuya:
A novel approach to the estimation of voice source and vocal tract parameters from speech signals.
- Hartmut R. Pfitzinger, Susanne Burger, Sebastian Heid:
Syllable detection in read and spontaneous speech.
- Kuansan Wang, Chin-Hui Lee, Biing-Hwang Juang:
Maximum likelihood learning of auditory feature maps for stationary vowels.
- Antonio Bonafonte, Albino Nogueiras, Antonio Rodriguez-Garrido:
Explicit segmentation of speech using Gaussian models.
- E. Mousset, William A. Ainsworth, José A. R. Fonollosa:
A comparison of several recent methods of fundamental frequency and voicing decision estimation.
- Toshihiko Abe, Takao Kobayashi, Satoshi Imai:
Robust pitch estimation with harmonics enhancement in noisy environments based on instantaneous frequency.
- Asunción Moreno, Miquel Rutllán:
Integrated polispectrum on speech recognition.
Speech Recognition Using HMMs and NNs
- Joao P. Neto, Ciro Martins, Luís B. Almeida:
An incremental speaker-adaptation technique for hybrid HMM-MLP recognizer.
- Youngjoo Suh, Youngjik Lee:
Phoneme segmentation of continuous speech using multi-layer perceptron.
- Jeff Bilmes, Nelson Morgan, Su-Lin Wu, Hervé Bourlard:
Stochastic perceptual speech models with durational dependence.
- G. D. Cook, Anthony J. Robinson:
Boosting the performance of connectionist large vocabulary speech recognition.
- Nicolas Pican, Dominique Fohr, Jean-François Mari:
HMMs and OWE neural network for continuous speech recognition.
- Steve Waterhouse, Dan J. Kershaw, Tony Robinson:
Smoothed local adaptation of connectionist systems.
Adverse Environments and Multiple Microphones
- Takeshi Yamada, Satoshi Nakamura, Kiyohiro Shikano:
Robust speech recognition with speaker localization by a microphone array.
- Ea-Ee Jan, James L. Flanagan:
Sound source localization in reverberant environments using an outlier elimination algorithm.
- Dan J. Kershaw, Tony Robinson, Steve Renals:
The 1995 abbot LVCSR system for multiple unknown microphones.
- Diego Giuliani, Maurizio Omologo, Piergiorgio Svaizer:
Experiments of speech recognition in a noisy and reverberant environment using a microphone array and HMM.
- Joaquin Gonzalez-Rodriguez, Javier Ortega-Garcia, César Martin, Luis Hernández:
Increasing robustness in GMM speaker recognition systems for noisy and reverberant speech with low complexity microphone arrays.
- Kuan-Chieh Yen, Yunxin Zhao:
Robust automatic speech recognition using a multi-channel signal separation front-end.
Prosodic Synthesis in Dialogue
Speech Synthesis
- Richard Sproat:
Multilingual text analysis for text-to-speech synthesis.
- Yoshifumi Ooyama, Hisako Asano, Koji Matsuoka:
Spoken-style explanation generator for Japanese kanji using a text-to-speech system.
- Ken-ichi Magata, Tomoki Hamagami, Mitsuo Komura:
A method for estimating prosodic symbol from text for Japanese text-to-speech synthesis.
- Eduardo López Gonzalo, Jose M. Rodriguez-Garcia:
Statistical methods in data-driven modeling of Spanish prosody for text to speech.
- Jung-Chul Lee, Youngjik Lee, Sanghun Kim, Minsoo Hahn:
Intonation processing for TTS using stylization and neural network learning method.
- Alan W. Black, Andrew Hunt:
Generating F0 contours from toBI labels using linear regression.
- Wern-Jun Wang, Shaw-Hwa Hwang, Sin-Horng Chen:
The broad study of homograph disambiguity for Mandarin speech synthesis.
- Thierry Dutoit, Vincent Pagel, Nicolas Pierret, F. Bataille, Olivier van der Vrecken:
The MBROLA project: towards a set of high quality speech synthesizers free of use for non commercial purposes.
- Makoto Hashimoto, Norio Higuchi:
Training data selection for voice conversion using speaker selection and vector field smoothing.
- Ki-Seung Lee, Dae Hee Youn, Il-Whan Cha:
A new voice transformation method based on both linear and nonlinear prediction analysis.
- Geneviève Baudoin, Yannis Stylianou:
On the transformation of the speech spectrum for voice conversion.
- Cristina Delogu, Andrea Paoloni, Susanna Ragazzini, Paola Ridolfi:
Spectral analysis of synthetic speech and natural speech with noise over the telephone line.
- Weizhong Zhu, Hideki Kasuya:
A new speech synthesis system based on the ARX speech production model.
- Geraldo Lino de Campos, Evandro B. Gouvêa:
Speech synthesis using the CELP algorithm.
- Shaw-Hwa Hwang, Sin-Horng Chen, Yih-Ru Wang:
A Mandarin text-to-speech system.
- Mike D. Edgington, A. Lowry:
Residual-based speech modification algorithms for text-to-speech synthesis.
- Per Olav Heggtveit:
A generalized LR parser for text-to-speech synthesis.
- M. P. Pollard, Barry M. G. Cheetham, C. C. Goodyear, Mike D. Edgington, A. Lowry:
Enhanced shape-invariant pitch and time-scale modification for concatenative speech synthesis.
- Yasuhiko Arai, Ryo Mochizuki, Hirofumi Nishimura, Takashi Honda:
An excitation synchronous pitch waveform extraction method and its application to the VCV-concatenation synthesis of Japanese spoken words.
- Ren-Hua Wang, Qingfeng Liu, Difei Tang:
A new Chinese text-to-speech system with high naturalness.
- Ansgar Rinscheid:
Voice conversion based on topological feature maps and time-variant filtering.
Instructional Technology for Spoken Language
Multimodal Spoken Language Processing
- Lynne E. Bernstein, Christian Benoît:
For speech perception by humans or machines, three senses are better than one.
- Kaoru Sekiyama, Yoh'ichi Tohkura, Michio Umeda:
A few factors which affect the degree of incorporating lip-read information into speech perception.
- Eric Vatikiotis-Bateson, Kevin G. Munhall, Y. Kasahara, Frederique Garcia, Hani Yehia:
Characterizing audiovisual information during speech.
- Charlotte M. Reed:
The implications of the tadoma method of speechreading for spoken language processing.
- Ruth Campbell:
Seeing speech in space and time: psychological and neurological findings.
- Kerry P. Green:
Studies of the mcgurk effect: implications for theories of speech perception.
- N. M. Brooke:
Using the visual component in automatic speech recognition.
- Robert E. Remez:
Perceptual organization of speech in one and several modalities: common functions, common resources.
- David B. Pisoni, Helena M. Saldaña, Sonya M. Sheffert:
Multi-modal encoding of speech in memory: a first report.
Prosody - Phonological/Phonetic Measures
Phonetics and Perception
Language Acquisition
- Jean E. Andruski, Patricia K. Kuhl:
The acoustic structure of vowels in mothers' speech to infants and adults.
- Chris J. Clement, Florien J. Koopmans-van Beinum, Louis C. W. Pols:
Acoustical characteristics of sound production of deaf and normally hearing infants.
- John Kingston, Christine Bartels, José Benkí, Deanna Moore, Jeremy Rice, Rachel Thorburn, Neil Macmillan:
Learning non-native vowel categories.
- Pierre A. Hallé, Toshisada Deguchi, Yuji Tamekawa, Benedicte de Boysson-Bardies, Shigeru Kiritani:
Word recognition by Japanese infants.
- Peter W. Jusczyk:
Investigations of the word segmentation abilities of infants.
- Akiko Hayashi, Yuji Tamekawa, Toshisada Deguchi, Shigeru Kiritani:
Developmental change in perception of clause boundaries by 6- and 10-month-old Japanese infants.
Production and Prosody Posters
- Paavo Alku, Erkki Vilkman:
A frequency domain method for parametrization of the voice source.
- Krzysztof Marasek:
Glottal correlates of the word stress and the tense/lax opposition in German.
- Suzanne Boyce, Carol Y. Espy-Wilson:
Coarticulatory stability in american English /r/.
- Shinobu Masaki, Reiko Akahane-Yamada, Mark K. Tiede, Yasuhiro Shimada, Ichiro Fujimoto:
An MRI-based analysis of the English /r/ and /l/ articulations.
- David van Kuijk:
Does lexical stress or metrical stress better predict word boundaries in Dutch?
- Alan Wrench, Alan D. McIntosh, William J. Hardcastle:
Optopalatograph (OPG): a new apparatus for speech production analysis.
- René Carré:
Prediction of vowel systems using a deductive approach.
- Sheila J. Mair, Celia Scully, Christine H. Shadle:
Distinctions between [t] and [tch] using electropalatography data.
- Michiko Hashi, Raymond D. Kent, John R. Westbury, Mary J. Lindstrom:
Relating formants and articulation in intelligibility test words.
- Imad Znagui, Mohamed Yeou:
The role of coarticulation in the perception of vowel quality in modern standard Arabic.
- Simon Arnfield, Wilf Jones:
Updating the reading EPG.
- Goangshiuan S. Ying, Leah H. Jamieson, Ruxin Chen, Carl D. Mitchell:
Lexical stress detection on stress-minimal word pairs.
- Jing Wang:
An acoustic study of the interaction between stressed and unstressed syllables in spoken Mandarin.
- Nobuaki Minematsu, Seiichi Nakagawa:
Automatic detection of accent nuclei at the head of words for speech recognition.
- Fu-Chiang Chou, Chiu-yu Tseng, Lin-Shan Lee:
Automatic generation of prosodic structure for high quality Mandarin speech synthesis.
- Tomoki Hamagami, Ken-ichi Magata, Mitsuo Komura:
A study on Japanese prosodic pattern and its modeling in restricted speech.
- Steve Hoskins:
A phonetic study of focus in intransitive verb sentences.
- Stefan Rapp:
Goethe for prosody.
- K. A. Straub:
Prosodic cues in syntactically ambiguous strings; an interactive speech planning mechanism.
- Jinfu Ni, Ren-Hua Wang, Deyu Xia:
A functional model for generation of the local components of F0 contours in Chinese.
- Marie Fellbaum:
The acquisition of voiceless stops in the interlanguage of second language learners of English and Spanish.
User-Machine Interfaces
- Brian Mellor, Chris Baber, C. Tunley:
Evaluating automatic speech recognition as a component of a multi-input device human-computer interface.
- A. Life, I. Salter, Jean-Noel Temem, F. Bernard, Sophie Rosset, Samir Bennacef, Lori Lamel:
Data collection for the MASK kiosk: WOz vs prototype system.
- M. Karaorman, Ted H. Applebaum, T. Itoh, M. Endo, Y. Ohno, M. Hoshimi, Takahiro Kamai, Kenji Matsui, Kazue Hata, Steve Pearson, Jean-Claude Junqua:
An experimental Japanese/English interpreting video phone system.
- Sara Basson, Stephen Springer, Cynthia Fong, Hong C. Leung, Edward Man, Michele Olson, John F. Pitrelli, Ranvir Singh, Suk Wong:
User participation and compliance in speech automated telecommunications applications.
- Samuel Bayer:
Embedding speech in web interfaces.
- Toshihiro Isobe, Masatoshi Morishima, Fuminori Yoshitani, Nobuo Koizumi, Ken'ya Murakami:
Voice-activated home banking system and its field trial.
TTS Systems and Rules
- Sangho Lee, Yung-Hwan Oh:
A text analyzer for Korean text-to-speech systems.
- Helen E. Karn:
Design and evaluation of a phonological phrase parser for Spanish text-to-speech.
- Ove Andersen, Roland Kuhn, Ariane Lazaridès, Paul Dalsgaard, Jürgen Haas, Elmar Nöth:
Comparison of two tree-structured approaches for grapheme-to-phoneme conversion.
- M. J. Adamson, Robert I. Damper:
A recurrent network that learns to pronounce English text.
- Eleonora Cavalcante Albano, Agnaldo Antonio Moreira:
Archisegment-based letter-to-phone conversion for concatenative speech synthesis in Portuguese.
- Yuki Yoshida, Shin'ya Nakajima, Kazuo Hakoda, Tomohisa Hirokawa:
A new method of generating speech synthesis units based on phonological knowledge and clustering technique.
Prosody and Labeling
- Martine Grice, Matthias Reyelt, Ralf Benzmüller, Jörg Mayer, Anton Batliner:
Consistency in transcription and labelling of German intonation with GToBI.
- Anton Batliner, Ralf Kompe, Andreas Kießling, Heinrich Niemann, Elmar Nöth:
Syntactic-prosodic labeling of large spontaneous speech data-bases.
- Florien J. Koopmans-van Beinum, Monique E. van Donzel:
Relationship between discourse structure and dynamic speech rate.
- Nigel Ward:
Using prosodic clues to decide when to produce back-channel utterances.
- Marion Mast, Ralf Kompe, Stefan Harbeck, Andreas Kießling, Heinrich Niemann, Elmar Nöth, Ernst Günter Schukat-Talamazzini, Volker Warnke:
Dialog act classification with the help of prosody.
- David van Kuijk, Henk van den Heuvel, Lou Boves:
Using lexical stress in continuous speech recognition for dutch.
Speaker/Language Identification and Verification
- Karsten Kumpf, Robin W. King:
Automatic accent classification of foreign accented australian English speech.
- Filipp Korkmazskiy, Biing-Hwang Juang:
Discriminative adaptation for speaker verification.
- Verna Stockmal, D. Muljani, Zinny S. Bond:
Perceptual features of unknown foreign languages as revealed by multi-dimensional scaling.
- Kin Yu, John S. Mason:
On-line incremental adaptation for speaker verification using maximum likelihood estimates of CDHMM parameters.
- Dominique Genoud, Frédéric Bimbot, Guillaume Gravier, Gérard Chollet:
Combining methods to improve speaker verification decision.
- Cesar Martín del Alamo, J. Alvarez, C. de la Torre, F. J. Poyatos, Lis Hernández:
Incremental speaker adaptation with minimum error discriminative training for speaker identification.
- Konstantin P. Markov, Seiichi Nakagawa:
Frame level likelihood normalization for text-independent speaker identification using Gaussian mixture models.
- Ann E. Thymé-Gobbel, Sandra E. Hutchins:
On using prosodic cues in automatic language identification.
- Tadashi Kitamura, Shinsai Takei:
Speaker recognition model using two-dimensional mel-cepstrum and predictive neural network.
- HingKeung Kwan, Keikichi Hirose:
Unknown language rejection in language identification system.
- James Hieronymus, Shubha Kadambe:
Spoken language identification using large vocabulary speech recognition.
- Carlos Teixeira, Isabel Trancoso, António Joaquim Serralheiro:
Accent identification.
- Sarel van Vuuren:
Comparison of text-independent speaker recognition methods on telephone speech with acoustic mismatch.
- Xue Yang, J. Bruce Millar, Iain MacLeod:
On the sources of inter- and intra-speaker variability in the acoustic dynamics of speech.
- Kay M. Berkling, Etienne Barnard:
Language identification with inaccurate string matching.
- Michael J. Carey, Eluned S. Parris, Harvey Lloyd-Thomas, S. J. Bennett:
Robust prosodic features for speaker identification.
- Enric Monte, Javier Hernando Pericas, Xavier Miró, A. Adolf:
Text independent speaker identification on noisy environments by means of self organizing maps.
- Paul Dalsgaard, Ove Andersen, Hanne Hesselager, Bojan Petek:
Language identification using language-dependent phonemes and language-independent speech units.
Emotion in Recognition and Synthesis
Stochastic Techniques in Robust Speech Recognition
- Chin-Hui Lee, Biing-Hwang Juang, Wu Chou, J. J. Molina-Perez:
A study on task-independent subword selection and modeling for speech recognition.
- Mazin G. Rahim, Chin-Hui Lee:
Simultaneous ANN feature and HMM recognizer design using string-based minimum classification error (MCE) training.
- Sunil K. Gupta, Frank K. Soong, Raziel Haimi-Cohen:
Quantizing mixture-weights in a tied-mixture HMM.
- M. J. F. Gales, D. Pye, Philip C. Woodland:
Variance compensation within the MLLR framework for robust speech recognition and speaker adaptation.
- Arun C. Surendran, Chin-Hui Lee, Mazin G. Rahim:
Maximum-likelihood stochastic matching approach to non-linear equalization for robust speech recognition.
- Jen-Tzung Chien, Hsiao-Chuan Wang, Lee-Min Lee:
Estimation of channel bias for telephone speech recognition.
Prosodic Synthesis in Text to Speech
Dialogue Events
Databases and Tools
- Peter Roach, Simon Arnfield, William J. Barry, J. Baltova, Marian Boldea, Adrian Fourcin, W. Gonet, Ryszard Gubrynowicz, E. Hallum, Lori Lamel, Krzysztof Marasek, Alain Marchal, Einar Meister, Klára Vicsi:
BABEL: an eastern european multi-language database.
- Ren-Hua Wang, Deyu Xia, Jinfu Ni, Bicheng Liu:
USTC95 - a putonghua corpus.
- Edward Hurley, Joseph Polifroni, James R. Glass:
Telephone data collection using the world wide web.
- M. Falcone, A. Gallo:
The "SIVA" speech database for speaker verification: description and evaluation.
- Christoph Draxler:
A multi-level description of date expressions in German telephone speech.
- Robert H. Halstead Jr., Ben Serridge, Jean-Manuel Van Thong, William Goldenthal:
Viterbi search visualization using vista: a generic performance visualization tool.
- Toomas Altosaar, Matti Karjalainen, Martti Vainio:
A multilingual phonetic representation and analysis system for different speech databases.
- Detlev Langmann, Reinhold Haeb-Umbach, Lou Boves, Els den Os:
FRESCO: the French telephone speech data collection - part of the european Speechdat(m) project.
- Johannes Müller, Holger Stahl, Manfred Lang:
Predicting the out-of-vocabulary rate and the required vocabulary size for speech processing applications.
- Nathalie Parlangeau, Alain Marchal:
AMULET: automatic MUltisensor speech labelling and event tracking: study of the spatio-temporal correlations in voiceless plosive production.
- Minsoo Hahn, Sanghun Kim, Jung-Chul Lee, Yong-Ju Lee:
Constructing multi-level speech database for spontaneous speech processing.
- Marian Boldea, Alin Doroga, Tiberiu Dumitrescu, Maria Pescaru:
Preliminaries to a romanian speech database.
- Klaus J. Kohler:
Labelled data bank of spoken standard German the Kiel corpus of read/spontaneous speech.
- I. Lee Hetherington, Michael K. McCandless:
SAPPHIRE: an extensible speech analysis and recognition tool based on tcl/tk.
- Jiro Kiyama, Yoshiaki Itoh, Ryuichi Oka:
Automatic detection of topic boundaries and keywords in arbitrary speech using incremental reference interval-free continuous DP.
- Bo-Ren Bai, Lee-Feng Chien, Lin-Shan Lee:
Very-large-vocabulary Mandarin voice message file retrieval using speech queries.
- Håkan Melin:
Gandalf - a Swedish telephone speaker verification database.
- Ellen Gurman Bard, Catherine Sotillo, Anne H. Anderson, M. M. Taylor:
The DCIEM map task corpus: spontaneous dialogue under sleep deprivation and drug treatment.
- Xavier Menéndez-Pidal, James B. Polikoff, Shirley M. Peters, Jennie E. Leonzio, H. Timothy Bunnell:
The nemours database of dysarthric speech.
- Jean Hennebert, Dijana Petrovska-Delacrétaz:
POST: parallel object-oriented speech toolkit.
Robust Speech Processing
Dialects and Speaking Styles
Production and Perception of Prosody
Topics in ASR and Search
- Joerg P. Ueberla, I. R. Gransden:
Clustered language models with context-equivalent states.
- Yuji Yonezawa, Masato Akagi:
Modeling of contextual effects and its application to word spotting.
- Jochen Junkawitsch, L. Neubauer, Harald Höge, Günther Ruske:
A new keyword spotting algorithm with pre-calculated optimal thresholds.
- Roxane Lacouture, Yves Normandin:
Detection of ambiguous portions of signal corresponding to OOV words or misrecognized portions of input.
- Fabio Brugnara, Marcello Federico:
Techniques for approximating a trigram language model.
- Keizaburo Takagi, Koichi Shinoda, Hiroaki Hattori, Takao Watanabe:
Unsupervised and incremental speaker adaptation under adverse environmental conditions.
- Hugo Van hamme, Filip Van Aelten:
An adaptive-beam pruning technique for continuous speech recognition.
- Carlos Avendaño, Sarel van Vuuren, Hynek Hermansky:
Data based filter design for RASTA-like channel normalization in ASR.
- Stefan Ortmanns, Hermann Ney, Frank Seide, I. Lindam:
A comparison of time conditioned and word conditioned search techniques for large vocabulary speech recognition.
- Stefan Ortmanns, Hermann Ney, A. Eiden:
Language-model look-ahead for large vocabulary speech recognition.
- Jean-Luc Husson, Yves Laprie:
A new search algorithm in segmentation lattices of speech signals.
- Tomokazu Yamada, Shigeki Sagayama:
LR-parser-driven viterbi search with hypotheses merging mechanism using context-dependent phone models.
- Jan Nouza:
Discrete-utterance recognition with a fast match based on total data reduction.
- F. Javier Caminero-Gil, C. de la Torre, Luis Villarrubia, Cesar Martín del Alamo, Lis Hernández:
On-line garbage modeling with discriminant analysis for utterance verification.
- Paul Placeway, John D. Lafferty:
Cheating with imperfect transcripts.
- Naoto Iwahashi:
Novel training method for classifiers used in speaker adaptation.
- Katsuki Minamino:
Large vocabulary word recognition based on a graph-structured dictionary.
- Bach-Hiep Tran, Frank Seide, Volker Steinbiss:
A word graph based n-best search in continuous speech recognition.
- David M. Goblirsch:
Viterbi beam search with layered bigrams.
- Eric Buhrke, Wu Chou, Qiru Zhou:
A wave decoder for continuous speech recognition.
- Eric Thelen:
Long term on-line speaker adaptation for large vocabulary dictation.
- Gerhard Sagerer, Heike Rautenstrauch, Gernot A. Fink, Bernd Hildebrandt, A. Jusek, Franz Kummert:
Incremental generation of word graphs.
- Irina Illina, Yifan Gong:
Improvement in n-best search for continuous speech recognition.
- Antonio Bonafonte, José B. Mariño, Albino Nogueiras:
Sethos: the UPC speech understanding system.
- Pietro Laface, Luciano Fissore, A. Maro, Franco Ravera:
Segmental search for continuous speech recognition.
Multimodal Dialogue/HCI
- A. P. Breen, E. Bowers, W. Welsh:
An investigation into the generation of mouth shapes for a talking head.
- Bertrand Le Goff, Christian Benoît:
A text-to-audiovisual-speech synthesizer for French.
- Yuri Iwano, Shioya Kageyama, Emi Morikawa, Shu Nakazato, Katsuhiko Shirai:
Analysis of head movements and its role in spoken dialogue.
- Satoru Hayamizu, Osamu Hasegawa, Katunobu Itou, Katsuhiko Sakaue, Kazuyo Tanaka, Shigeki Nagaya, Masayuki Nakazawa, T. Endoh, Fumio Togawa, Kenji Sakamoto, Kazuhiko Yamamoto:
RWC multimodal database for interactions by integration of spoken language and visual information.
- Christian Cavé, Isabelle Guaïtella, Roxane Bertrand, Serge Santi, Françoise Harlay, Robert Espesser:
About the relationship between eyebrow movements and F0 variations.
- Laurel Fais, Kyung-ho Loken-Kim, Tsuyoshi Morimoto:
How many words is a picture really worth?
- A. Lagana, F. Lavagetto, A. Storace:
Visual synthesis of source acoustic speech through kohonen neural networks.
- Helena M. Saldaña, David B. Pisoni, Jennifer M. Fellowes, Robert E. Remez:
Audio-visual speech perception without speech cues.
Multilingual Speech Processing
- Jim Barnett, Andrés Corrada, G. Gao, Larry Gillick, Yoshiko Ito, Steve Lowe, Linda Manganaro, Barbara Peskin:
Multilingual speech recognition at dragon systems.
- Joachim Köhler:
Multi-lingual phoneme recognition exploiting acoustic-phonetic similarities of sounds.
- Atsushi Nakamura, Shoichi Matsunaga, Tohru Shimizu, Masahiro Tonomura, Yoshinori Sagisaka:
Japanese speech databases for robust speech recognition.
- Lori Lamel, Martine Adda-Decker, Jean-Luc Gauvain, Gilles Adda:
Spoken language processing in a multilingual context.
- Victor Zue, Stephanie Seneff, Joseph Polifroni, Helen M. Meng, James R. Glass:
Multilingual human-computer interactions: from information access to language learning.
- Ulla Ackermann, Bianca Angelini, Fabio Brugnara, Marcello Federico, Diego Giuliani, Roberto Gretter, Gianni Lazzari, Heinrich Niemann:
Speedata: multilingual spoken data entry.
- Hiyan Alshawi:
Head automata for speech translation.
- Ye-Yi Wang, John D. Lafferty, Alex Waibel:
Word clustering with parallel spoken language corpora.
- Jae-Woo Yang, Youngjik Lee:
Toward translating Korean speech into other languages.
- Thomas Bub, Johannes Schwinn:
VERBMOBIL: the evolution of a complex large speech-to-speech translation system.
- Alon Lavie, Alex Waibel, Lori S. Levin, Donna Gates, Marsal Gavaldà, Torsten Zeppenfeld, Puming Zhan, Oren Glickman:
Translation of conversational speech with JANUS-II.
Acoustics in Synthesis
Pitch and Rate
General ASR Posters
- Puming Zhan, Klaus Ries, Marsal Gavaldà, Donna Gates, Alon Lavie, Alex Waibel:
JANUS-II: towards spontaneous Spanish speech recognition.
- Kris Demuynck, Jacques Duchateau, Dirk Van Compernolle:
Reduced semi-continuous models for large vocabulary continuous speech recognition in Dutch.
- Andrei Constantinescu, Olivier Bornet, Gilles Caloz, Gérard Chollet:
Validating different flexible vocabulary approaches on the Swiss French Polyphone and Polyvar databases.
- Néstor Becerra Yoma, Fergus R. McInnes, Mervyn A. Jack:
Use of a reliability coefficient in noise cancelling by neural net and weighted matching algorithms.
- Kazuhiko Ozeki:
Likelihood normalization using an ergodic HMM for continuous speech recognition.
- Laurence Candille, Henri Meloni:
Dynamic control of a production model.
- Hiroaki Hattori, Eiko Yamada:
Speech recognition using sub-word units dependent on phonetic contexts of both training and recognition vocabularies.
- Bruno Jacob, Christine Sénac:
Hidden Markov models merging acoustic and articulatory information to automatic speech recognition.
- Mats Blomberg, Kjell Elenius:
Creation of unseen triphones from diphones and monophones using a speech production approach.
- Bo Xu, Bing Ma, Shuwu Zhang, Fei Qu, Taiyi Huang:
Speaker-independent dictation of Chinese speech with 32k vocabulary.
- J. J. Humphries, Philip C. Woodland, D. Pearce:
Using accent-specific pronunciation modelling for robust speech recognition.
- Tilo Sloboda, Alex Waibel:
Dictionary learning for spontaneous speech recognition.
- Johan de Veth, Lou Boves:
Comparison of channel normalisation techniques for automatic speech recognition over the phone.
- Manuel A. Leandro, José Manuel Pardo:
Anchor point detection for continuous speech recognition in Spanish: the spotting of phonetic events.
- Bhiksha Raj, Evandro Bacci Gouvêa, Pedro J. Moreno, Richard M. Stern:
Cepstral compensation by polynomial approximation for environment-independent speech recognition.
- B. T. Lilly, Kuldip K. Paliwal:
Effect of speech coders on speech recognition performance.
- Léonard Janer, Josep Martí, Climent Nadeu, Eduardo Lleida-Solano:
Wavelet transforms for non-uniform speech recogntion systems.
- Tsuyoshi Usagawa, Markus Bodden, Klaus Rateitschek:
A binaural model as a front-end for isolated word recognition.
- Hiroshi G. Okuno, Tomohiro Nakatani, Takeshi Kawabata:
A new speech enhancement: speech stream segregation.
Data-based Synthesis
- Andrew Slater, John Coleman:
Non-segmental analysis and synthesis based on a speech database.
- Ralf Benzmüller, William J. Barry:
Microsegment synthesis - economic principles in a low-cost solution.
- X. D. Huang, Alex Acero, J. Adcock, Hsiao-Wuen Hon, J. Goldsmith, J. Liu, Mike Plumpe:
Whistler: a trainable text-to-speech system.
- Thomas Portele, Karlheinz Stöber, Horst Meyer, Wolfgang Hess:
Generation of multiple synthesis inventories by a bootstrapping procedure.
- Bernd Möbius, Jan P. H. van Santen:
Modeling segmental duration in German text-to-speech synthesis.
- Nick Campbell:
Autolabelling Japanese ToBI.
Speaker Identification and Verification
- S. Parthasarathy, Aaron E. Rosenberg:
General phrase speaker verification using sub-word background models and likelihood-ratio scoring.
- J. Murakami, M. Sugiyama, H. Watanabe:
Unknown-multiple signal source clustering problem using ergodic HMM and applied to speaker classification.
- Jean-Luc Le Floch, Claude Montacié, Marie-José Caraty:
GMM and ARVM cooperation and competition for text-independent speaker recognition on telephone speech.
- Qiguang Lin, Ea-Ee Jan, ChiWei Che, Dong-Suk Yuk, James L. Flanagan:
Selective use of the speech spectrum and a VQGMM method for speaker identification.
- Michael Newman, Larry Gillick, Yoshiko Ito, Don McAllaster, Barbara Peskin:
Speaker verification through large vocabulary continuous speech recognition.
- Andrea Paoloni, Susanna Ragazzini, Giacomo Ravaioli:
Predictive neural networks in text independent speaker verification: an evaluation on the SIVA database.
Acoustic Phonetics
Perception of Vowels and Consonants
- Jialu Zhang:
On the syllable structures of Chinese relating to speech recognition.
- Takashi Otake, Kiyoko Yoneyama:
Can a moraic nasal occur word-initially in Japanese?
- Winifred Strange, Reiko Akahane-Yamada, B. H. Fitzgerald, R. Kubo:
Perceptual assimilation of american English vowels by Japanese listeners.
- Winifred Strange, Ocke-Schwen Bohn, S. A. Trent, M. C. McNair, K. C. Bielec:
Context and speaker effects in the perceptual assimilation of German vowels by american listeners.
- Mohamed Zahid:
Examination of a perceptual non-native speech contrast: pharyngealized/non-pharyngealized discrimination by French-speaking adults.
- Roel Smits:
Context-dependent relevance of burst and transitions for perceived place in stops: it's in production, not perception.
- Ryoji Baba, Kaori Omuro, Hiromitsu Miyazono, Tsuyoshi Usagawa, Masahiko Higuchi:
The perception of morae in long vowels comparison among Japanese, Korean and English speakers.
- Robin J. Lickley:
Juncture cues to disfluency.
- James R. Sawusch:
Effects of duration and formant movement on vowel perception.
- Neeraj Deshmukh, Richard Duncan, Aravind Ganapathiraju, Joseph Picone:
Benchmarking human performance for continuous speech recognition.
- Takayuki Arai, Misha Pavel, Hynek Hermansky, Carlos Avendaño:
Intelligibility of speech with filtered time trajectories of spectral envelopes.
- Douglas H. Whalen, Sonya M. Sheffert:
Perceptual use of vowel and speaker information in breath sounds.
- Philippe Mousty, Monique Radeau, Ronald Peereman, Paul Bertelson:
The role of neighborhood relative frequency in spoken word recognition.
- James M. McQueen, Mark A. Pitt:
Transitional probability and phoneme monitoring.
- Anne Bonneau:
Identification of vowel features from French stop bursts.
- Zinny S. Bond, Thomas J. Moore, Beverley Gable:
Listening in a second language.
- Denis Burnham, Elizabeth Francis, Di Webster, Sudaporn Luksaneeyanawin, Chayada Attapaiboon, Francisco Lacerda, Peter Keller:
Perception of lexical tone across languages: evidence for a linguistic mode of processing.
- James S. Magnuson, Reiko Akahane-Yamada:
Acoustic correlates to the effects of talker variability on the perception of English /r/ and /l/ by Japanese listeners.
Last update Fri May 25 08:23:00 2012
CET by the DBLP Team —
Data released under the ODC-BY 1.0 license — See also our legal information page