Daniel P. Lopresti, Shourya Roy, Klaus U. Schulz, L. Venkata Subramaniam (Eds.): Proceedings of the Second Workshop on Analytics for Noisy Unstructured Text Data, AND 2008, Singapore, July 24, 2008. ACM 2008 ACM International Conference Proceeding Series 303 ISBN 978-1-60558-196-5
Donna Harman: Some thoughts on failure analysis for noisy data.
John Tait: Noise and information.
Daniel P. Lopresti: Optical character recognition errors and their effects on natural language processing. 9-16
Ulrich Reffle, Annette Gotscharek, Christoph Ringlstetter, Klaus U. Schulz: Successfully detecting and correcting false friends using channel profiles. 17-22
Valentin Jijkoun, Mahboob Alam Khalid, Maarten Marx, Maarten de Rijke: Named entity normalization in user generated content. 23-30
Rema Ananthanarayanan, Vijil Chenthamarakshan, Prasad M. Deshpande, Raghuram Krishnapuram: Rule based synonyms for entity extraction from noisy text. 31-38
Jiyin He, Wouter Weerkamp, Martha Larson, Maarten de Rijke: Blogger, stick to your story: modeling topical noise in blogs with coherence measures. 39-46
Robert McArthur: Uncovering deep user context from blogs. 47-54
Soumya Datta, Sudeshna Sarkar: A comparative study of statistical features of language in blogs-vs-splogs. 63-66
Sreangsu Acharyya, Sumit Negi, L. Venkata Subramaniam, Shourya Roy: Unsupervised learning of multilingual short message service (SMS) dialect from noisy examples. 67-74
Antti Järvelin, Tuomas Talvensaari, Anni Järvelin: Data driven methods for improving mono- and cross-lingual IR performance in noisy environments. 75-82
Rachit Arora, Balaraman Ravindran: Latent dirichlet allocation based multi-document summarization. 91-97
Amaresh Kumar Pandey, Tanveer J. Siddiqui: An unsupervised Hindi stemmer with heuristic improvements. 99-105
Anurag Bhardwaj, Faisal Farooq, Huaigu Cao, Venu Govindaraju: Topic based language models for OCR correction. 107-112



