dblp.uni-trier.de www.dagstuhl.de www.uni-trier.de

Latent Semantic Indexing: A Probabilistic Analysis.

Christos H. Papadimitriou, Prabhakar Raghavan, Hisao Tamaki, Santosh Vempala: Latent Semantic Indexing: A Probabilistic Analysis. PODS 1998: 159-168
@inproceedings{DBLP:conf/pods/PapadimitriouRTV98,
  author    = {Christos H. Papadimitriou and
               Prabhakar Raghavan and
               Hisao Tamaki and
               Santosh Vempala},
  editor    = {Alberto O. Mendelzon and
               Jan Paredaens},
  title     = {Latent Semantic Indexing: A Probabilistic Analysis},
  booktitle = {Proceedings of the Seventeenth ACM SIGACT-SIGMOD-SIGART Symposium
               on Principles of Database Systems, June 1-3, 1998, Seattle, Washington,
               USA},
  publisher = {ACM Press},
  year      = {1998},
  isbn      = {0-89791-996-3},
  pages     = {159-168},
  ee        = {http://doi.acm.org/10.1145/275487.275505, db/conf/pods/PapadimitriouRTV98.html},
  crossref  = {DBLP:conf/pods/98},
  bibsource = {DBLP, http://dblp.uni-trier.de}
}

Abstract

Latent semantic indexing (LSI) is an information retrieval technique based on the spectral analysis of the term-document matrix, whose empirical success had heretofore been without rigorous prediction and explanation. We prove that, under certain conditions, LSI does succeed in capturing the underlying semantics of the corpus and achieves improved retrieval performance. We also propose the technique of random projection as a way of speeding up LSI. We complement our theorems with encouraging experimental results. We also argue that our results may be viewed in a more general framework, as a theoretical basis for the use of spectral methods in a wider class of applications such as collaborative filtering.

Copyright © 1998 by the ACM, Inc., used by permission. Permission to make digital or hard copies is granted provided that copies are not made or distributed for profit or direct commercial advantage, and that copies show this notice on the first page or initial screen of a display along with the full citation.


Load The ACM SIGMOD Anthology, CDROM Edition, Volume 1-3, PODS '82-'98. and ... Load The ACM SIGMOD Anthology, Silver Edition, DVD 1, Proceedings. and ...

Printed Edition

Alberto O. Mendelzon, Jan Paredaens (Eds.): Proceedings of the Seventeenth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, June 1-3, 1998, Seattle, Washington, USA. ACM Press 1998, ISBN 0-89791-996-3
Contents CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML

Online Edition: ACM Digital Library

[Index Terms]
[Full Text in PDF Format, 1052 KB]

References

[1]
...
[2]
...
[3]
...
[4]
...
[5]
...
[6]
...
[7]
...
[8]
...
[9]
Ronald Fagin: Combining Fuzzy Information from Multiple Systems. PODS 1996: 216-226 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[10]
...
[11]
...
[12]
...
[13]
Norbert Fuhr: Probabilistic Models in Information Retrieval. Comput. J. 35(3): 243-255(1992) CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[14]
...
[15]
...
[16]
...
[17]
...
[18]
...
[19]
...
[20]
Mark Jerrum, Alistair Sinclair: Approximating the Permanent. SIAM J. Comput. 18(6): 1149-1178(1989) CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[21]
...
[22]
C. J. van Rijsbergen: Information Retrieval. Butterworth 1979, ISBN 0-408-70929-4
CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[23]
...
[24]
Howard R. Turtle, W. Bruce Croft: A Comparison of Text Retrieval Models. Comput. J. 35(3): 279-290(1992) CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[25]
...

Referenced by

  1. Jon M. Kleinberg, Andrew Tomkins: Applications of Linear Algebra in Information Retrieval and Hypertext Analysis. PODS 1999: 185-193
  2. Soumen Chakrabarti, Byron Dom, Rakesh Agrawal, Prabhakar Raghavan: Scalable Feature Selection, Classification and Signature Generation for Organizing Large Text Databases into Hierarchical Topic Taxonomies. VLDB J. 7(3): 163-178(1998)

Last update Fri May 25 08:32:52 2012 CET by the DBLP TeamThis material is Open Data Data released under the ODC-BY 1.0 license — See also our legal information page