
Introduction to Information Retrieval
Cambridge University Press, 7/7/2008
EAN 9780521865715, ISBN10: 0521865719
Hardcover, 506 pages, 26 x 18.5 x 3.1 cm
Language: English
Class-tested and coherent, this textbook teaches classical and web information retrieval, including web search and the related areas of text classification and text clustering from basic concepts. It gives an up-to-date treatment of all aspects of the design and implementation of systems for gathering, indexing, and searching documents; methods for evaluating systems; and an introduction to the use of machine learning methods on text collections. All the important ideas are explained using examples and figures, making it perfect for introductory courses in information retrieval for advanced undergraduates and graduate students in computer science. Based on feedback from extensive classroom experience, the book has been carefully structured in order to make teaching more natural and effective. Slides and additional exercises (with solutions for lecturers) are also available through the book's supporting website to help course instructors prepare their lectures.
1. Information retrieval using the Boolean model
2. The dictionary and postings lists
3. Tolerant retrieval
4. Index construction
5. Index compression
6. Scoring and term weighting
7. Vector space retrieval
8. Evaluation in information retrieval
9. Relevance feedback and query expansion
10. XML retrieval
11. Probabilistic information retrieval
12. Language models for information retrieval
13. Text classification and Naive Bayes
14. Vector space classification
15. Support vector machines and kernel functions
16. Flat clustering
17. Hierarchical clustering
18. Dimensionality reduction and latent semantic indexing
19. Web search basics
20. Web crawling and indexes
21. Link analysis.
'This is the first book that gives you a complete picture of the complications that arise in building a modern web-scale search engine. You'll learn about ranking SVMs, XML, DNS, and LSI. You'll discover the seedy underworld of spam, cloaking, and doorway pages. You'll see how MapReduce and other approaches to parallelism allow us to go beyond megabytes and to efficiently manage petabytes.' Peter Norvig, Director of Research, Google Inc.