INTERSPEECH 2009: Brighton, UK
INTERSPEECH 2009, 10th Annual Conference of the International Speech Communication Association, Brighton, United Kingdom, September 6-10, 2009. ISCA 2009
Keynotes
Sadaoki Furui: Selected topics from 40 years of research on speech and speaker recognition. 1-8
Thomas L. Griffiths: Connecting human and machine learning via probabilistic models of cognition. 9-12
Deb Roy: New horizons in the study of child language acquisition. 13-20
Mari Ostendorf: Transcribing human-directed speech for spoken language processing. 21-27
ASR: Features for Noise Robustness
Chanwoo Kim, Richard M. Stern: Feature extraction for robust speech recognition using a power-law nonlinearity and power-bias subtraction. 28-31
Yu-Hsiang Bosco Chiu, Bhiksha Raj, Richard M. Stern: Towards fusion of feature extraction and acoustic model training: a top down process for robust speech recognition. 32-35
Luz García, Roberto Gemello, Franco Mana, José C. Segura: Progressive memory-based parametric non-linear feature equalization. 40-43
Osamu Ichikawa, Takashi Fukuda, Ryuki Tachibana, Masafumi Nishimura: Dynamic features in the linear domain for robust automatic speech recognition in a reverberant environment. 44-47
Antonio Miguel, Alfonso Ortega, Luis Buera, Eduardo Lleida: Local projections and support vector based feature selection in speech recognition. 48-51
Production: Articulatory Modelling
Qiang Fang, Akikazu Nishikido, Jianwu Dang, Aijun Li: Feedforward control of a 3d physiological articulatory model for vowel production. 52-55
Jun Cai, Yves Laprie, Julie Busset, Fabrice Hirsch: Articulatory modeling based on semi-polar coordinates and guided PCA technique. 56-59
Xiao Bo Lu, William Thorpe, Kylie Foster, Peter Hunter: From experiments to articulatory motion - a three dimensional talking head model. 64-67
Takayuki Arai: Sliding vocal-tract model and its application for vowel production. 72-75
Systems for LVCSR and Rich Transcription
Haihua Xu, Daniel Povey, Jie Zhu, Guanyong Wu: Minimum hypothesis phone error as a decoding method for speech recognition. 76-79
Stefan Kombrink, Lukás Burget, Pavel Matejka, Martin Karafiát, Hynek Hermansky: Posterior-based out of vocabulary word detection in telephone speech. 80-83
Yuya Akita, Masato Mimura, Tatsuya Kawahara: Automatic transcription system for meetings of the Japanese national congress. 84-87
Jonas Lööf, Christian Gollan, Hermann Ney: Cross-language bootstrapping for unsupervised acoustic model training: rapid development of a Polish speech recognition system. 88-91
Alberto Abad, Isabel Trancoso, Nelson Neto, Céu Viana: Porting an european portuguese broadcast news recognition system to brazilian portuguese. 92-95
Julien Despres, Petr Fousek, Jean-Luc Gauvain, Sandrine Gay, Yvan Josse, Lori Lamel, Abdelkhalek Messaoudi: Modeling northern and southern varieties of dutch for STT. 96-99
Speech Analysis and Processing I-III
Thomas Ewender, Sarah Hoffmann, Beat Pfister: Nearly perfect detection of continuous f_0 contour and frame classification for TTS synthesis. 100-103
Yannis Pantazis, Olivier Rosec, Yannis Stylianou: AM-FM estimation for speech based on a time-varying sinusoidal model. 104-107
Jon Gudnason, Mark R. P. Thomas, Patrick A. Naylor, Daniel P. W. Ellis: Voice source waveform analysis and synthesis using principal component analysis and Gaussian mixture modelling. 108-111
Jung Ook Hong, Patrick J. Wolfe: Model-based estimation of instantaneous pitch in noisy speech. 112-115
Thomas Drugman, Baris Bozkurt, Thierry Dutoit: Complex cepstrum-based decomposition of speech for glottal source estimation. 116-119
Stephen A. Zahorian, Hongbing Hu, Zhengqing Chen, Jiang Wu: Spectral and temporal modulation features for phonetic recognition. 1071-1074
Ibon Saratxaga, Daniel Erro, Inmaculada Hernáez, Iñaki Sainz, Eva Navas: Use of harmonic phase information for polarity detection in speech signals. 1075-1078
Michael Wohlmayr, Franz Pernkopf: Finite mixture spectrogram modeling for multipitch tracking using a factorial hidden Markov model. 1079-1082
Anthony P. Stark, Kuldip K. Paliwal: Group-delay-deviation based spectral analysis of speech. 1083-1086
Joseph M. Anand, B. Yegnanarayana, Sanjeev Gupta, M. R. Kesheorey: Speaker dependent mapping for low bit rate coding of throat microphone speech. 1087-1090
G. Bapineedu, B. Avinash, Suryakanth V. Gangashetty, B. Yegnanarayana: Analysis of Lombard speech using excitation source information. 1091-1094
Andrew Errity, John McKenna: A comparison of linear and nonlinear dimensionality reduction methods applied to synthetic speech. 1095-1098
Christian Fischer Pedersen, Ove Andersen, Paul Dalsgaard: ZZT-domain immiscibility of the opening and closing phases of the LF GFM under frame length variations. 1099-1102
Hongjun Sun, Jianhua Tao, Huibin Jia: Dimension reducing of LSF parameters based on radial basis function neural network. 1103-1106
A. N. Harish, D. Rama Sanand, Srinivasan Umesh: Characterizing speaker variability using spectral envelopes of vowel sounds. 1107-1110
Tharmarajah Thiruvaran, Eliathamby Ambikairajah, Julien Epps: Analysis of band structures for speaker-specific information in FM feature extraction. 1111-1114
Karl Schnell, Arild Lacroix: Artificial nasalization of speech sounds based on pole-zero models of spectral relations between mouth and nose signals. 1115-1118
Andrew Hines, Naomi Harte: Error metrics for impaired auditory nerve responses of different phoneme groups. 1119-1122
Chatchawarn Hansakunbuntheung, Hiroaki Kato, Yoshinori Sagisaka: Model-based automatic evaluation of L2 learner's English timing. 2871-2874
Petko N. Petkov, Iman S. Mossavat, W. Bastiaan Kleijn: A Bayesian approach to non-intrusive quality assessment of speech. 2875-2878
Ladan Baghai-Ravary, Greg Kochanski, John Coleman: Precision of phoneme boundaries derived using hidden Markov models. 2879-2882
Lakshmish Kaushik, Douglas D. O'Shaughnessy: A novel method for epoch extraction from speech signals. 2883-2886
Jia Min Karen Kua, Julien Epps, Eliathamby Ambikairajah, Eric H. C. Choi: LS regularization of group delay features for speaker recognition. 2887-2890
Thomas Drugman, Thierry Dutoit: Glottal closure and opening instant detection from speech signals. 2891-2894
Speech Perception I, II
Masashi Ito, Keiji Ohara, Akinori Ito, Masafumi Yano: Relative importance of formant and whole-spectral cues for vowel perception. 124-127
Chihiro Takeshima, Minoru Tsuzaki, Toshio Irino: Influences of vowel duration on speaker-size estimation and discrimination. 128-131
Václav Jonás Podlipský, Radek Skarnitzl, Jan Volín: High front vowels in Czech: a contrast in quantity or quality? 132-135
Marjorie Dole, Michel Hoen, Fanny Meunier: Effect of contralateral noise on energetic and informational masking on speech-in-speech intelligibility. 136-139
Heidi Christensen, Jon Barker: Using location cues to track speaker changes from mobile, binaural microphones. 140-143
Ioana Vasilescu, Martine Adda-Decker, Lori Lamel, Pierre A. Hallé: A perceptual investigation of speech transcription errors involving frequent near-homophones in French and american English. 144-147
Etienne Gaudrain, Su Li, Vin Shen Ban, Roy D. Patterson: The role of glottal pulse rate and vocal tract length in the perception of speaker identity. 148-151
Victoria Medina, Willy Serniclaes: Development of voicing categorization in deaf children with cochlear implant. 152-155
Annie Tremblay: Processing liaison-initial words in native and non-native French: evidence from eye movements. 156-159
Nigel G. Ward, Benjamin H. Walker: Estimating the potential of signal and interlocutor-track information for language modeling. 160-163

Zhanyu Ma, Arne Leijon: Human audio-visual consonant recognition analyzed with three bimodal integration models. 812-815
Hanny den Ouden, Hugo Quené: Effects of tempo in radio commercials on young and elderly listeners. 816-819
Sofia Strömbergsson: Self-voice recognition in 4 to 5-year-old children. 820-823
Carmen Peláez-Moreno, Ana I. García-Moral, Francisco J. Valverde-Albacete: Eliciting a hierarchical structure of human consonant perception task errors using formal concept analysis. 828-831
Takeshi Saitou, Masataka Goto: Acoustic and perceptual effects of vocal training in amateur male singing. 832-835
Accent and Language Recognition
Florian Verdet, Driss Matrouf, Jean-François Bonastre, Jean Hennebert: Factor analysis and SVM for language recognition. 164-167
Sabato Marco Siniscalchi, Jeremy Reed, Torbjørn Svendsen, Chin-Hui Lee: Exploring universal attribute characterization of spoken languages for spoken language recognition. 168-171
Abhijeet Sangwan, John H. L. Hansen: On the use of phonological features for automatic accent analysis. 172-175
Fabio Castaldo, Sandro Cumani, Pietro Laface, Daniele Colibro: Language recognition using language factors. 176-179
Je Hun Jeon, Yang Liu: Automatic accent detection: effect of base units and boundary information. 180-183
Ron M. Hecht, Omer Hezroni, Amit Manna, Ruth Aloni-Lavi, Gil Dobry, Amir Alfandary, Yaniv Zigel: Age verification using a hybrid speech processing approach. 184-187
Ron M. Hecht, Omer Hezroni, Amit Manna, Gil Dobry, Yaniv Zigel, Naftali Tishby: Information bottleneck based age verification. 188-191
Fred S. Richardson, William M. Campbell, Pedro A. Torres-Carrasquillo: Discriminative n-gram selection for dialect recognition. 192-195
Linsen Loots, Thomas Niesler: Data-driven phonetic comparison and conversion between south african, british and american English pronunciations. 196-199
Rong Tong, Bin Ma, Haizhou Li, Engsiong Chng, Kong-Aik Lee: Target-aware language models for spoken language recognition. 200-203
Daniel Chung Yong Lim, Ian R. Lane: Language identification for speech-to-speech translation. 204-207
Fadi Biadsy, Julia Hirschberg: Using prosody and phonotactics in Arabic dialect identification. 208-211
ASR: Acoustic Model Training and Combination
Pierre L. Dognin, John R. Hershey, Vaibhava Goel, Peder A. Olsen: Refactoring acoustic models using variational expectation-maximization. 212-215
Georg Heigold, David Rybach, Ralf Schlüter, Hermann Ney: Investigations on convex optimization using log-linear HMMs for digit string recognition. 216-219
Janne Pylkkönen: Investigations on discriminative training in large scale acoustic model estimation. 220-223
Erik McDermott, Shinji Watanabe, Atsushi Nakamura: Margin-space integration of MPE loss via differencing of MMI functionals for generalized error-weighted discriminative training. 224-227
Etienne Marcheret, Jia-Yu Chen, Petr Fousek, Peder A. Olsen, Vaibhava Goel: Compacting discriminative feature space transforms for embedded devices. 228-231
Hung-An Chang, James R. Glass: A back-off discriminative acoustic model for automatic speech recognition. 232-235
Junho Park, Frank Diehl, Mark J. F. Gales, Marcus Tomalin, Philip C. Woodland: Efficient generation and use of MLP features for Arabic speech recognition. 236-239
Xiaodong Cui, Jian Xue, Bing Xiang, Bowen Zhou: A study of bootstrapping with multiple acoustic features for improved automatic speech recognition. 240-243
Björn Hoffmeister, Ruoying Liang, Ralf Schlüter, Hermann Ney: Log-linear model combination with word-dependent scaling factors. 248-251
Spoken Dialogue Systems
Kyoko Matsuyama, Kazunori Komatani, Tetsuya Ogata, Hiroshi G. Okuno: Enabling a user to specify an item at any time during system enumeration - item identification for barge-in-able conversational dialogue systems. 252-255
Tomoyuki Yamagata, Tetsuya Takiguchi, Yasuo Ariki: System request detection in human conversation based on multi-resolution Gabor wavelet features. 256-259
Stefan Schwärzler, Stefan Maier, Joachim Schenk, Frank Wallhoff, Gerhard Rigoll: Using graphical models for mixed-initiative dialog management systems with realtime Policies. 260-263
Shinya Fujie, Yoichi Matsuyama, Hikaru Taniyama, Tetsunori Kobayashi: Conversation robot participating in and activating a group communication. 264-267
Chiori Hori, Kiyonori Ohtake, Teruhisa Misu, Hideki Kashioka, Satoshi Nakamura: Recent advances in WFST-based dialog system. 268-271
David Griol, Giuseppe Riccardi, Emilio Sanchis: A statistical dialog manager for the LUNA project. 272-275
Heriberto Cuayáhuitl, Juventino Montiel-Hernández: A Policy-switching learning approach for adaptive spoken dialogue agents. 276-279
Luis Fernando D'Haro, Ricardo de Córdoba, Rubén San Segundo, Javier Macías Guarasa, José Manuel Pardo: Strategies for accelerating the design of dialogue applications using heuristic information from the backend database. 280-283
Florian Pinault, Fabrice Lefèvre, Renato de Mori: Feature-based summary space for stochastic dialogue modeling with hierarchical semantic frames. 284-287
Rajesh Balchandran, Leonid Rachevsky, Larry Sansone: Language modeling and dialog management for address recognition. 288-291
Ea-Ee Jan, Hong-Kwang Kuo, Osamuyimen Stewart, David Lubensky: A framework for rapid development of conversational natural language call routing systems for call centers. 292-295
Jonas Beskow, Jens Edlund, Björn Granström, Joakim Gustafson, Gabriel Skantze, Helena Tobiasson: The MonAMI reminder: a spoken dialogue system for face-to-face interaction. 296-299
Julia Seebode, Stefan Schaffer, Ina Wechsung, Florian Metze: Influence of training on direct and indirect measures for the evaluation of multimodal systems. 300-303
Christine Kühnel, Benjamin Weiss, Sebastian Möller: Talking heads for interacting with spoken dialog smart-home systems. 304-307
Aki Kunikoshi, Yu Qiao, Nobuaki Minematsu, Keikichi Hirose: Speech generation from hand gestures based on space mapping. 308-311
Special Session: INTERSPEECH 2009 Emotion Challenge

Santiago Planet, Ignasi Iriondo Sanz, Joan Claudi Socoró, Carlos Monzo, Jordi Adell: GTM-URL contribution to the INTERSPEECH 2009 emotion challenge. 316-319
Chi-Chun Lee, Emily Mower, Carlos Busso, Sungbok Lee, Shrikanth S. Narayanan: Emotion recognition using a hierarchical binary decision tree approach. 320-323
Elif Bozkurt, Engin Erzin, Çigdem Eroglu Erdem, A. Tanju Erdem: Improving automatic emotion recognition from speech signals. 324-327
Thurid Vogt, Elisabeth André: Exploring the benefits of discretization of acoustic features for speech emotion recognition. 328-331
Iker Luengo, Eva Navas, Inmaculada Hernáez: Combining spectral and prosodic information for emotion recognition in the interspeech 2009 emotion challenge. 332-335
Roberto Barra-Chicote, Fernando Fernández-Martínez, Syaheerah L. Lutfi, Juan Manuel Lucas-Cuesta, Javier Macías Guarasa, Juan Manuel Montero, Rubén San Segundo, José Manuel Pardo: Acoustic emotion recognition using dynamic Bayesian networks and multi-space distributions. 336-339
Tim Polzehl, Shiva Sundaram, Hamed Ketabdar, Michael Wagner, Florian Metze: Emotion classification in children's speech using fusion of acoustic and linguistic features. 340-343
Pierre Dumouchel, Najim Dehak, Yazid Attabi, Réda Dehak, Narjès Boufaden: Cepstral and long-term features for emotion recognition. 344-347
Marcel Kockmann, Lukás Burget, Jan Cernocký: Brno University of Technology system for Interspeech 2009 emotion challenge. 348-351
Automatic Speech Recognition: Language Models I, II
Boulos Harb, Ciprian Chelba, Jeffrey Dean, Sanjay Ghemawat: Back-off language model compression. 352-355
Tobias Kaufmann, Thomas Ewender, Beat Pfister: Improving broadcast news transcription with a precision grammar and discriminative reranking. 356-359
Xunying Liu, Mark J. F. Gales, Philip C. Woodland: Use of contexts in language model interpolation and adaptation. 360-363
Jim L. Hieronymus, Xunying Liu, Mark J. F. Gales, Philip C. Woodland: Exploiting Chinese character models to improve speech recognition performance. 364-367
Gwénolé Lecorvé, Guillaume Gravier, Pascale Sébillot: Constraint selection for topic-based MDI adaptation of language models. 368-371
Chuang-Hua Chueh, Jen-Tzung Chien: Nonstationary latent Dirichlet allocation for speech recognition. 372-375
Sopheap Seng, Laurent Besacier, Brigitte Bigi, Eric Castelli: Multiple text segmentation for statistical language modeling. 2663-2666
Langzhou Chen, K. K. Chin, Kate Knill: Improved language modelling using bag of word pairs. 2671-2674
Frank Diehl, Mark J. F. Gales, Marcus Tomalin, Philip C. Woodland: Morphological analysis and decomposition for Arabic speech-to-text systems. 2675-2678
Amr El-Desoky, Christian Gollan, David Rybach, Ralf Schlüter, Hermann Ney: Investigating the use of morphological decomposition and diacritization for improving Arabic LVCSR. 2679-2682
Welly Naptali, Masatoshi Tsuchiya, Seiichi Nakagawa: Topic dependent language model based on topic voting on noun history. 2683-2686
Péter Mihajlik, Balázs Tarján, Zoltán Tüske, Tibor Fegyó: Investigation of morph-based speech recognition improvements across speech genres. 2687-2690
Kengo Ohta, Masatoshi Tsuchiya, Seiichi Nakagawa: Effective use of pause information in language modelling for speech recognition. 2691-2694
Songfang Huang, Steve Renals: A parallel training algorithm for hierarchical pitman-yor process language models. 2695-2698
Stanislas Oger, Vladimir Popescu, Georges Linarès: Probabilistic and possibilistic language models based on the world wide web. 2699-2702
Phoneme-Level Perception
Jack C. Rogers, Matthew H. Davis: Categorical perception of speech without stimulus repetition. 376-379
Anne Cutler, Chris Davis, Jeesun Kim: Non-automaticity of use of orthographic knowledge in phoneme evaluation. 380-383
Meghan Sumner: Learning and generalization of novel contrastive cues. 384-387
Einar Meister, Stefan Werner: Vowel category perception affected by microdurational variations. 388-391
Nandini Iyer, Douglas Brungart, Brian D. Simpson: Perceptual grouping of alternating word pairs: effect of pitch difference and presentation rate. 392-395
Titia Benders, Paul Boersma: Comparing methods to find a best exemplar in a multidimensional space. 396-399
Statistical Parametric Synthesis I, II

Cheng-Cheng Wang, Zhen-Hua Ling, Li-Rong Dai: Asynchronous F0 and spectrum modeling for HMM-based speech synthesis. 404-407
Yao Qian, Frank K. Soong, Miaomiao Wang, Zhizheng Wu: A minimum v/u error approach to F0 generation in HMM-based TTS. 408-411
Shiyin Kang, Zhiwei Shuang, Quansheng Duan, Yong Qin, Lianhong Cai: Voiced/unvoiced decision algorithm for HMM-based speech synthesis. 412-415
Xavi Gonzalvo, Alexander Gutkin, Joan Claudi Socoró, Ignasi Iriondo Sanz, Paul Taylor: Local minimum generation error criterion for hybrid HMM speech synthesis. 416-419
Junichi Yamagishi, Bela Usabaev, Simon King, Oliver Watts, John Dines, Jilei Tian, Rile Hu, Yong Guan, Keiichiro Oura, Keiichi Tokuda, Reima Karhila, Mikko Kurimo: Thousands of voices for HMM-based speech synthesis. 420-423
Systems for Spoken Language Translation
Sylvain Raybaud, David Langlois, Kamel Smaïli: Efficient combination of confidence measures for machine translation. 424-427
David Stallard, Stavros Tsakalidis, Shirin Saleem: Incremental dialog clustering for speech-to-speech translation. 428-431
Ruhi Sarikaya, Sameer Maskey, R. Zhang, Ea-Ee Jan, D. Wang, Bhuvana Ramabhadran, Salim Roukos: Iterative sentence-pair extraction from quasi-parallel corpora for machine translation. 432-435
Juan M. Huerta, Cheng Wu, Andrej Sakrajda, Sasha Caskey, Ea-Ee Jan, Alexander Faisman, Shai Ben-David, Wen Liu, Antonio Lee, Osamuyimen Stewart, Michael Frissora, David Lubensky: RTTS: towards enterprise-level real-time speech transcription and translation services. 436-439
Jing Zheng, Necip Fazil Ayan, Wen Wang, David Burkett: Using syntax in large-scale audio document translation. 440-443
Andreas Tsiartas, Prasanta Kumar Ghosh, Panayiotis G. Georgiou, Shrikanth S. Narayanan: Context-driven automatic bilingual movie subtitle alignment. 444-447
Human Speech Production I, II

Odile Bagou, Violaine Michel, Marina Laganaro: On the production of sandhi phenomena in French: psycholinguistic and acoustic data. 452-455
Chierh Cheng, Yi Xu: Extreme reductions: contraction of disyllables into monosyllables in taiwan Mandarin. 456-459
Mitchell Peabody, Stephanie Seneff: Annotation and features of non-native Mandarin tone quality. 460-463
Katerina Chládková, Paul Boersma, Václav Jonás Podlipský: On-line formant shifting as a function of F0. 464-467
Kimiko Yamakawa, Shigeaki Amano, Shuichi Itahashi: Production boundary between fricative and affricate in Japanese and Korean speakers. 468-471
Cátia M. R. Pinho, Luis M. T. Jesus, Anna Barney: Aerodynamics of fricative production in european portuguese. 472-475
Anne Bonneau, Julie Buquet, Brigitte Wrobel-Dautcourt: Contextual effects on protrusion and lip opening for /i, y/. 476-479
Catarina Oliveira, Paula Martins, António J. S. Teixeira: Speech rate effects on european portuguese nasal vowels. 480-483
Tamás Gábor Csapó, Zsuzsanna Bárkányi, Tekla Etelka Gráczi, Tamás Bohm, Steven M. Lulich: Relation of formants and subglottal resonances in Hungarian vowels. 484-487
Takayuki Arai: Simple physical models of the vocal tract for education in speech science. 756-759
Tokihiko Kaburagi, Katsunori Daimo, Shogo Nakamura: Voice production model employing an interactive boundary-layer analysis of glottal flow. 764-767
Matt Speed, Damian T. Murphy, David M. Howard: Characteristics of two-dimensional finite difference techniques for vocal tract analysis and voice synthesis. 768-771
Christian Kroos: Using sensor orientation information for computational head stabilisation in 3d electromagnetic articulography (EMA). 776-779
Laura Enflo, Johan Sundberg, Friedemann Pabst: Collision threshold pressure before and after vocal loading. 780-783
Elke Philburn: Gender differences in the realization of vowel-initial glottalization. 784-787
Hayo Terband, Frits van Brenk, Pascal van Lieshout, Lian Nijland, Ben Maassen: Stability and composition of functional synergies for speech movements in children and adults. 788-791
Frits van Brenk, Hayo Terband, Pascal van Lieshout, Anja Lowit, Ben Maassen: An analysis of speech rate strategies in aging. 792-795
Stefan Benus: Variability and stability in collaborative dialogues: turn-taking and filled pauses. 796-799
Prosody, Text Analysis, and Multilingual Models
Harald Romsdorfer: Polyglot speech prosody control. 488-491
Harald Romsdorfer: Weighted neural network ensemble models for speech prosody control. 492-495
Vataya Boonpiam, Anocha Rugchatjaroen, Chai Wutiwiwatchai: Cross-language F0 modeling for under-resourced tonal languages: a case study on Thai-Mandarin. 496-499
Dafydd Gibbon, Pramod Pandey, D. Mary Kim Haokip, Jolanta Bachan: Prosodic issues in synthesising thadou, a tibeto-burman tone language. 500-503
Chen-Yu Chiang, Sin-Horng Chen, Yih-Ru Wang: Advanced unsupervised joint prosody labeling and modeling for Mandarin speech and its application to prosody generation for TTS. 504-507
Ausdang Thangthai, Anocha Rugchatjaroen, Nattanun Thatphithakkul, Ananlada Chotimongkol, Chai Wutiwiwatchai: Optimization of t-tilt F0 modeling. 508-511
Nicolas Obin, Xavier Rodet, Anne Lacheret-Dujour: A multi-level context-dependent prosodic model applied to durational modeling. 512-515
Alexandre Trilla, Francesc Alías: Sentiment classification in English from sentence-level annotations of emotions regarding models of affect. 516-519
Leonardo Badino, J. Sebastian Andersson, Junichi Yamagishi, Robert A. J. Clark: Identification of contrast and its emphatic realization in HMM based speech synthesis. 520-523
Antonio Rui Ferreira Rebordão, Shaikh Mostafa Al Masum, Keikichi Hirose, Nobuaki Minematsu: How to improve TTS systems for emotional expressivity. 524-527
Yi-Jian Wu, Yoshihiko Nankaku, Keiichi Tokuda: State mapping based method for cross-lingual speaker adaptation in HMM-based speech synthesis. 528-531
Frederick Weber, Kalika Bali: Real voice and TTS accent effects on intelligibility and comprehension for indian speakers of English as a second language. 532-535
Pablo Daniel Agüero, Antonio Bonafonte, Juan Carlos Tulli: Improving consistence of phonetic transcription for text-to-speech. 536-539
Automatic Speech Recognition: Adaptation I, II
Piero Cosi: On the development of matched and mismatched Italian children's speech recognition systems. 540-543
Oscar Saz, Eduardo Lleida, Antonio Miguel: Combination of acoustic and lexical speaker adaptation for disordered speech recognition. 544-547
Hwa Jeon Song, Yongwon Jeong, Hyung Soon Kim: Bilinear transformation space-based maximum likelihood linear regression frameworks. 548-551
Yusuke Ijima, Takeshi Matsubara, Takashi Nose, Takao Kobayashi: Speaking style adaptation for spontaneous speech recognition using multiple-regression HMM. 552-555
S. P. Rath, Srinivasan Umesh: Acoustic class specific VTLN-warping using regression class trees. 556-559
Sébastien Demange, Dirk Van Compernolle: Speaker normalization for template based speech recognition. 560-563
Rohit Sinha, Shweta Ghai: On the use of pitch normalization for improving children's speech recognition. 568-571
S. P. Rath, Srinivasan Umesh, Achintya Kumar Sarkar: Using VTLN matrices for rapid and computationally-efficient speaker adaptation with robustness to first-pass transcription errors. 572-575
Koichi Shinoda, Hiroko Murakami, Sadaoki Furui: Speaker adaptation based on two-step active learning. 576-579
Mats Blomberg, Daniel Elenius: Tree-based estimation of speaker characteristics for speech recognition. 580-583
D. Rama Sanand, S. P. Rath, Srinivasan Umesh: A study on the influence of covariance adaptation on jacobian compensation in vocal tract length normalization. 584-587
Santiago Omar Caballero Morales, Stephen J. Cox: On the estimation and the use of confusion-matrices for improving ASR accuracy. 1599-1602
Shigeki Matsuda, Yu Tsao, Jinyu Li, Satoshi Nakamura, Chin-Hui Lee: A study on soft margin estimation of linear regression parameters for speaker adaptation. 1603-1606
Shweta Ghai, Rohit Sinha: Exploring the role of spectral smoothing in context of children's speech recognition. 1607-1610
Kishan Thambiratnam, Frank Seide: Unsupervised lattice-based acoustic model adaptation for speaker-dependent conversational telephone speech transcription. 1611-1614
Satoshi Kobashikawa, Atsunori Ogawa, Yoshikazu Yamaguchi, Satoshi Takahashi: Rapid unsupervised adaptation using frame independent output probabilities of gender and context independent phoneme models. 1615-1618
Shizhen Wang, Yi-Hui Lee, Abeer Alwan: Bark-shift based nonlinear speaker normalization using the second subglottal resonance. 1619-1622
Applications in Learning and Other Areas
Gregory Aist, Jack Mostow: Designing spoken tutorial dialogue with children to elicit predictable but educationally valuable responses. 588-591
Joost van Doremalen, Helmer Strik, Catia Cucchiarini: Optimizing non-native speech recognition for CALL applications. 592-595
Akinori Ito, Tomoaki Konno, Masashi Ito, Shozo Makino: Evaluation of English intonation based on combination of multiple evaluation scores. 596-599
Andreas Maier, Florian Hönig, Viktor Zeißler, Anton Batliner, E. Körner, N. Yamanaka, P. Ackermann, Elmar Nöth: A language-independent feature set for the automatic evaluation of prosody. 600-603
Klaus Zechner, Derrick Higgins, René Lawless, Yoko Futagi, Sarah Ohls, George Ivanov: Adapting the acoustic model of a speech recognizer for varied proficiency non-native spontaneous speech using read speech with language-specific pronunciation difficulty. 604-607
Dean Luo, Yu Qiao, Nobuaki Minematsu, Yutaka Yamauchi, Keikichi Hirose: Analysis and utilization of MLLR speaker adaptation technique for learners' pronunciation evaluation. 608-611
Miki Iimura, Taichi Sato, Kihachiro Tanaka: Control of human generating force by use of acoustic information - study on onomatopoeic utterances for controlling small lifting-force. 612-615
Géza Németh, Csaba Zainkó, Mátyás Bartalis, Gábor Olaszy, Géza Kiss: Human voice or prompt generation? can they co-exist in an application? 620-623
Quoc Anh Le, Andrei Popescu-Belis: Automatic vs. human question answering over multimedia meeting recordings. 624-627
Special Session: Silent Speech Interfaces
John F. Holzrichter: Characterizing silent and pseudo-silent speech using radar-like sensors. 628-631
Tomoki Toda, Keigo Nakamura, Takayuki Nagai, Tomomi Kaino, Yoshitaka Nakajima, Kiyohiro Shikano: Technologies for processing body-conducted speech detected with non-audible murmur microphone. 632-635
Jonathan S. Brumberg, Philip R. Kennedy, Frank H. Guenther: Artificial speech synthesizer control by brain-computer interface. 636-639
Thomas Hueber, Elie-Laurent Benaroya, Gérard Chollet, Bruce Denby, Gérard Dreyfus, Maureen Stone: Visuo-phonetic decoding using multi-stream and context-dependent models for an ultrasound-based silent speech interface. 640-643
Yunbin Deng, Rupal Patel, James T. Heaton, Glen Colby, L. Donald Gilmore, Joao Cabrera, Serge H. Roy, Carlo J. De Luca, Geoffrey S. Meltzner: Disordered speech recognition using acoustic and sEMG signals. 644-647
Michael Wand, Szu-Chen Stan Jou, Arthur R. Toth, Tanja Schultz: Impact of different speaking modes on EMG-based speech recognition. 648-651
Arthur R. Toth, Michael Wand, Tanja Schultz: Synthesizing speech from electromyography using voice transformation techniques. 652-655
Viet-Anh Tran, Gérard Bailly, Hélène Loevenbruck, Tomoki Toda: Multimodal HMM-based NAM-to-speech conversion. 656-659
ASR: Discriminative Training
Jonathan Malkin, Amarnag Subramanya, Jeff Bilmes: On the semi-supervised learning of multi-layered perceptrons. 660-663
Roger Hsiao, Tanja Schultz: Generalized discriminative feature transformation for speech recognition. 664-667
Chih-Chieh Cheng, Fei Sha, Lawrence K. Saul: A fast online algorithm for large margin training of continuous density hidden Markov models. 668-671
Dong Yu, Li Deng, Alex Acero: Hidden conditional random field with distribution constraints for phone classification. 676-679
Sayaka Shiota, Kei Hashimoto, Yoshihiko Nankaku, Keiichi Tokuda: Deterministic annealing based training algorithm for Bayesian speech recognition. 680-683
Language Acquisition
Ilana Heintz, Mary E. Beckman, Eric Fosler-Lussier, Lucie Ménard: Evaluating parameters for mapping adult vowels to imitative babbling. 688-691
Chiharu Tsurutani: Intonation of Japanese sentences spoken by English speakers. 692-695
Mark Huckvale, Ian S. Howard, Sascha Fagel: KLAIR: a virtual infant for spoken language acquisition research. 696-699
Joseph Tepperman, Erik Bresch, Yoon-Chul Kim, Sungbok Lee, Louis Goldstein, Shrikanth S. Narayanan: An articulatory analysis of phonological transfer using real-time MRI. 700-703
Louis ten Bosch, Okko Johannes Räsänen, Joris Driesen, Guillaume Aimetti, Toomas Altosaar, Lou Boves, A. Corns: Do multiple caregivers speed up language acquisition? 704-707
ASR: Lexical and Prosodic Models
Antoine Laurent, Paul Deléglise, Sylvain Meignier: Grapheme to phoneme conversion using an SMT system. 708-711
Long Nguyen, Tim Ng, Kham Nguyen, Rabih Zbib, John Makhoul: Lexical and phonetic modeling for Arabic automatic speech recognition. 712-715
Gina-Anne Levow: Assessing context and learning for isizulu tone recognition. 716-719
Simon Dobrisek, Bostjan Vesnicer, France Mihelic: A sequential minimization algorithm for finite-state pronunciation lexicon models. 720-723
Kornel Laskowski, Mattias Heldner, Jens Edlund: A general-purpose 32 ms prosodic vector for hidden Markov modeling. 724-727
Dong Yang, Yi-Cheng Pan, Sadaoki Furui: Vocabulary expansion through automatic abbreviation generation for Chinese voice search. 728-731
Unit-Selection Synthesis
Qi Miao, Alexander Kain, Jan P. H. van Santen: Perceptual cost function for cross-fading based concatenation. 732-735
Daniel Tihelka, Jan Romportl: Exploring automatic similarity measures for unit selection tuning. 736-739
Cédric Boidin, Olivier Boëffard, Thierry Moudenc, Géraldine Damnati: Towards intonation control in unit selection speech synthesis. 740-743
Jerome R. Bellegarda: A novel approach to cost weighting in unit selection TTS. 744-747
Abubeker Gamboa Rosales, Hamurabi Gamboa Rosales, Ruediger Hoffmann: Maximum likelihood unit selection for corpus-based speech synthesis. 748-751
Shinsuke Sakai, Ranniery Maia, Hisashi Kawai, Satoshi Nakamura: A close look into the probabilistic concatenation model for corpus-based speech synthesis. 752-755
Speech and Audio Segmentation and Classification
Michael Wiesenegger, Franz Pernkopf: Wavelet-based speaker change detection in single channel speech data. 836-839
Laura Docío Fernández, Paula Lopez-Otero, Carmen García-Mateo: An adaptive threshold computation for unsupervised speaker segmentation. 840-843
Gibak Kim, Philipos C. Loizou: A data-driven approach for estimating the time-frequency binary mask. 844-847
Haolang Zhou, Damianos Karakos, Andreas G. Andreou: A semi-supervised version of heteroscedastic linear discriminant analysis. 848-851
Okko Johannes Räsänen, Unto K. Laine, Toomas Altosaar: Self-learning vector quantization for pattern discovery from speech. 852-855
Rohit Prabhavalkar, Zhaozhang Jin, Eric Fosler-Lussier: Monaural segregation of voiced speech using discriminative random fields. 856-859
Chi Zhang, John H. L. Hansen: Advancements in whisper-island detection within normally phonated audio streams. 860-863
Matthias Zimmermann: Joint segmentation and classification of dialog acts using conditional random fields. 864-867
Claire Brierley, Eric Atwell: Exploring complex vowels as phrase break correlates in a corpus of English speech with proPOSEL, a prosody and POS English lexicon. 868-871
Caroline Clemens, Stefan Feldes, Karlheinz Schuhmacher, Joachim Stegmann: Automatic topic detection of recorded voice messages. 872-875
Jindrich Matousek, Radek Skarnitzl, Pavel Machac, Jan Trmal: Identification and automatic detection of parasitic speech sounds. 876-879
Daniel R. van Niekerk, Etienne Barnard: Phonetic alignment for speech synthesis in under-resourced languages. 880-883
Kalu U. Ogbureke, Julie Carson-Berndsen: Improving initial boundary estimation for HMM-based automatic phonetic segmentation. 884-887
Speaker Recognition and Diarisation
Howard Lei, Eduardo López Gonzalo: Importance of nasality measures for speaker recognition data selection and performance prediction. 888-891
Ning Wang, P. C. Ching, Tan Lee: Exploration of vocal excitation modulation features for speaker recognition. 892-895
Xing Fan, John H. L. Hansen: Speaker identification for whispered speech using modified temporal patterns and MFCCs. 896-899
Runxin Li, Tanja Schultz, Qin Jin: Improving speaker segmentation via speaker identification and text segmentation. 904-907
David A. van Leeuwen: Overall performance metrics for multi-condition speaker recognition evaluations. 908-911
Matthias Wölfel, Qian Yang, Qin Jin, Tanja Schultz: Speaker identification using warped MVDR cepstral features. 912-915
Oshry Ben-Harush, Itshak Lapidot, Hugo Guterman: Entropy based overlapped speech detection as a pre-processing stage for speaker diarization. 916-919
Marijn Huijbregts, David A. van Leeuwen, Franciska M. G. de Jong: The majority wins: a method for combining speaker diarization systems. 924-927
Special Session: Advanced Voice Function Assessment
Krzysztof Izdebski, Yuling Yan, Melda Kunduk: Acoustic and high-speed digital imaging based analysis of pathological voice contributes to better understanding and differential diagnosis of neurological dysphonias and of mimicking phonatory disorders. 932-934
Maria E. Markaki, Yannis Stylianou: Normalized modulation spectral features for cross-database voice pathology detection. 935-938
Christophe Mertens, Francis Grenez, Jean Schoentgen: Speech sample salience analysis for speech cycle detection. 939-942
Viliam Rapcan, Shona D'Arcy, Nils Penard, Ian H. Robertson, Richard B. Reilly: The use of telephone speech recordings for assessment and monitoring of cognitive function in elderly people. 943-946
Sunil Nagaraja, Eduardo Castillo Guerra: Optimized feature set to assess acoustic perturbations in dysarthric speech. 947-950
Andreas Maier, Stefan Wenhardt, Tino Haderlein, Maria Schuster, Elmar Nöth: A microphone-independent visualization technique for speech disorders. 951-954
Rubén Fraile, Carmelo Sánchez, Juan Ignacio Godino-Llorente, Nicolás Sáenz-Lechón, Víctor Osma-Ruiz, Juana M. Gutiérrez: Evaluation of the effect of the GSM full rate codec on the automatic detection of laryngeal pathologies based on cepstral analysis. 955-958
Ali Alpan, Jean Schoentgen, Youri Maryn, Francis Grenez, P. Murphy: Cepstral analysis of vocal dysperiodicities in disordered connected speech. 959-962
Lise Crevier-Buchman, Stephanie Borel, Stéphane Hans, Madeleine Menard, Jacqueline Vaissière: Standard information from patients: the usefulness of self-evaluation (measured with the French version of the VHI). 963-966
Marcello Scipioni, Matteo Gerosa, Diego Giuliani, Elmar Nöth, Andreas Maier: Intelligibility assessment in children with cleft lip and palate in Italian and German. 967-970
Luis M. T. Jesus, Anna Barney, Ricardo Santos, Janine Caetano, Juliana Jorge, Pedro Sá Couto: Universidade de aveiro's voice evaluation protocol. 971-974
Automotive and Mobile Applications
Hoon Chung, JeonGue Park, HyeonBae Jeon, Yunkeun Lee: Fast speech recognition for voice destination entry in a car navigation system. 975-978
Yun-Cheng Ju, Michael L. Seltzer, Ivan Tashev: Improving perceived accuracy for in-car media search. 979-982
Florian Schiel, Christian Heinrich: Laying the foundation for in-car alcohol detection by speech. 983-986
Charl Johannes van Heerden, Johan Schalkwyk, Brian Strope: Language modeling for what-with-where on GOOG-411. 991-994
Jan Nouza, Petr Cerva, Jindrich Zdánský: Very large vocabulary voice dictation for mobile devices. 995-998
Prosody: Production I, II
Diana V. Dimitrova, Gisela Redeker, John C. J. Hoeks: Did you say a BLUE banana? the prosody of contrast and abnormality in bulgarian and dutch. 999-1002
Hansjörg Mixdorff, Hartmut R. Pfitzinger: A quantitative study of F0 peak alignment and sentence modality. 1003-1006
Szu-wei Chen, Bei Wang, Yi Xu: Closely related languages, different ways of realizing focus. 1007-1010
Plínio Almeida Barbosa, Céu Viana, Isabel Trancoso: Cross-variety rhythm typology in portuguese. 1011-1014
Marie Nilsenová, Marc Swerts, Véronique Houtepen, Heleen Dittrich: Pitch adaptation in different age groups: boundary tones versus global pitch. 1015-1018
Willemijn Heeren, Vincent J. van Heuven: Perception and production of boundary tones in whispered dutch. 2411-2414
Katrin Schweitzer, Arndt Riester, Michael Walsh, Grzegorz Dogil: Pitch accents and information status in a German radio news corpus. 2415-2418
Adrian Leemann, Keikichi Hirose, Hiroya Fujisaki: Analysis of voice fundamental frequency contours of continuing and terminating prosodic phrases in four swiss German dialects. 2419-2422
Michelina Savino: Intonational features for identifying regional accents of Italian. 2423-2426
Agnieszka Wagner: Analysis and recognition of accentual patterns. 2427-2430
Nigel G. Ward, Rafael Escalante-Ruiz: Using responsive prosodic variation to acknowledge the user's current state. 2431-2434
Oliver Niebuhr: Intonation segments and segmental intonation. 2435-2438
David House, Anastasia Karlsson, Jan-Olof Svantesson, Damrong Tayanin: The phrase-final accent in kammu: effects of tone, focus and engagement. 2439-2442
Raya Kalaldeh, Amelie Dorn, Ailbhe Ní Chasaide: Tonal alignment in three varieties of hiberno-English. 2443-2446
Lourdes Aguilar, Antonio Bonafonte, Francisco Campillo, David Escudero Mancebo: Determining intonational boundaries from the acoustic signal. 2447-2450
Hartmut R. Pfitzinger, Hansjörg Mixdorff, Jan Schwarz: Comparison of Fujisaki-model extractors and F0 stylizers. 2455-2458
Caterina Petrone, Mariapaola D'Imperio: Is tonal alignment interpretation independent of methodology? 2459-2462
Margaret Zellers, Brechtje Post, Mariapaola D'Imperio: Modeling the intonation of topic structure: two approaches. 2463-2466
ASR: Spoken Language Understanding
Silvia Quarteroni, Giuseppe Riccardi, Marco Dinarelli: What's in an ontology for spoken language understanding. 1023-1026
Hiroaki Nanjo, Hiroki Mikami, Hiroshi Kawano, Takanobu Nishiura: A fundamental study of shouted speech for acoustic-based security system. 1027-1030
Timo Baumann, Okko Buß, Michaela Atterer, David Schlangen: Evaluating the potential utility of ASR n-best lists for incremental spoken dialogue systems. 1031-1034
Bin Zhang, Wei Wu, Jeremy G. Kahn, Mari Ostendorf: Improving the recognition of names by document-level clustering. 1035-1038
Frédéric Béchet, Alexis Nasr: Robust dependency parsing for spoken language understanding of spontaneous speech. 1039-1042
Chao-Hong Liu, Chung-Hsien Wu: Semantic role labeling with discriminative feature selection for spoken language understanding. 1043-1046
Speaker Diarisation
Douglas A. Reynolds, Patrick Kenny, Fabio Castaldo: A study of new approaches to speaker diarization. 1047-1050
Themos Stafylakis, Vassilios Katsouros, George Carayannis: Redefining the Bayesian information criterion for speaker diarisation. 1051-1054
Shih-Sian Cheng, Chun-Han Tseng, Chia-Ping Chen, Hsin-Min Wang: Speaker diarization using divide-and-conquer. 1055-1058
Deepu Vijayasenan, Fabio Valente, Hervé Bourlard: KL realignment for speaker diarization with multiple feature streams. 1059-1062
Marijn Huijbregts, David A. van Leeuwen, Franciska M. G. de Jong: Speech overlap detection in a two-pass speaker diarization system. 1063-1066
Kyu Jeong Han, Shrikanth S. Narayanan: Improved speaker diarization of meeting speech with recurrent selection of representative speech segments and participant interaction pattern modeling. 1067-1070
Speech Processing with Audio or Audiovisual Input
Henry Widjaja, Suryoadhi Wibowo: Application of differential microphone array for IS-127 EVRC rate determination algorithm. 1123-1126
Alberto Yoshihiro Nakano, Seiichi Nakagawa, Kazumasa Yamamoto: Estimating the position and orientation of an acoustic source with a microphone array network. 1127-1130
Vishweshwara Rao, S. Ramakrishnan, Preeti Rao: Singing voice detection in polyphonic music using predominant pitch. 1131-1134
Juan Pablo Arias, Néstor Becerra Yoma, Hiram Vivanco: Word stress assessment for computer aided language learning. 1135-1138
Adrien Leman, Julien Faure, Etienne Parizet: A non-intrusive signal-based model for speech quality evaluation using automatic classification of background noises. 1139-1142
Kouhei Sumi, Tatsuya Kawahara, Jun Ogata, Masataka Goto: Acoustic event detection for spotting "hot spots" in podcasts. 1143-1146
Taras Butko, Cristian Canton-Ferrer, Carlos Segura, Xavier Giró, Climent Nadeu, Javier Hernando, Josep R. Casas: Improving detection of acoustic events using audiovisual data and feature level fusion. 1147-1150
Miguel Bugalho, José Portelo, Isabel Trancoso, Thomas Pellegrini, Alberto Abad: Detecting audio events for semantic video search. 1151-1154
Mickael Rouvier, Driss Matrouf, Georges Linarès: Factor analysis for audio-based video genre classification. 1155-1158
Mickael Rouvier, Georges Linarès, Driss Matrouf: Robust audio-based classification of video genre. 1159-1162
Joerg Schmalenstroeer, Martin Kelling, Volker Leutnant, Reinhold Haeb-Umbach: Fusing audio and video information for online speaker diarization. 1163-1166
Girija Chetty, Michael Wagner: Multimodal speaker verification using ancillary known speaker characteristics such as gender or age. 1167-1170
Guillaume Aimetti, Roger K. Moore, Louis ten Bosch, Okko Johannes Räsänen, Unto Kalervo Laine: Discovering keywords from cross-modal input: ecological vs. engineering methods for enhancing acoustic repetitions. 1171-1174
ASR: Decoding and Confidence Measures
Miroslav Novak: Incremental composition of static decoding graphs. 1175-1178
Jacques Duchateau, Kris Demuynck, Hugo Van hamme: Evaluation of phone lattice based speech decoding. 1179-1182
Jike Chong, Ekaterina Gonina, Youngmin Yi, Kurt Keutzer: A fully data parallel WFST-based large vocabulary continuous speech recognition on a graphics processing unit. 1183-1186
Benjamin Lecouteux, Georges Linarès, Benoît Favre: Combined low level and high level features for out-of-vocabulary word detection. 1187-1190
Björn Hoffmeister, Ralf Schlüter, Hermann Ney: Bayes risk approximations using time overlap with an application to system combination. 1191-1194
Christopher M. White, Ariya Rastrow, Sanjeev Khudanpur, Frederick Jelinek: Unsupervised estimation of the language model scaling factor. 1195-1198
Atsunori Ogawa, Atsushi Nakamura: Simultaneous estimation of confidence and error cause in speech recognition using discriminative model. 1199-1202
Cyril Allauzen, Michael Riley, Johan Schalkwyk: A generalized composition algorithm for weighted finite-state transducers. 1203-1206
Stefano Scanzio, Pietro Laface, Daniele Colibro, Roberto Gemello: Word confidence using duration models. 1207-1210
Preethi Jyothi, Eric Fosler-Lussier: A comparison of audio-free speech recognition error prediction methods. 1211-1214
Petr Motlícek: Automatic out-of-language detection based on confidence measures derived from LVCSR word and phone lattices. 1215-1218
Robust Automatic Speech Recognition I-III
Randy Gomez, Tatsuya Kawahara: Optimization of dereverberation parameters based on likelihood of speech recognizer. 1223-1226
Jort F. Gemmeke, Yujun Wang, Maarten Van Segbroeck, Bert Cranen, Hugo Van hamme: Application of noise robust MDT speech recognition on the SPEECON and speechdat-car databases. 1227-1230
Alexander Krueger, Reinhold Haeb-Umbach: Model based feature enhancement for automatic speech recognition in reverberant environments. 1231-1234
Masakiyo Fujimoto, Kentaro Ishizuka, Tomohiro Nakatani: A study of mutual front-end processing method based on statistical model for noise robust speech recognition. 1235-1238
