I recently read Relevant Search, which contains the authors’ real-world experience on tuning search engines (ElasticSearch / Solr) for quality results. This book is designed for people looking to understand how search engines work internally so they can provide business value – more than just learning how to tinker with the settings these tools offer. hile the book focuses primarily on Elastic search, the authors clearly understand the differences between Solr and ElasticSearch (e.g. term vs field centric search), apparently through doing consulting work.
The book explains the basics of how full text search works, for those who may need a refresher, and quickly becomes insightful. For instance, going into this I had a simplistic understanding of TF/IDF, and they discuss areas where it performs poorly, and what functionality lucene-based products get around these limitations. Both tools contain an “explain” feature that is supposed to tell you how they chose the ranking, but it takes some effort to learn, and these authors clearly spent many hours gaining an understanding of how this works.
The book offers a bunch of neat motivating examples – how to use full text engines for map search, melody search, and how to design year range search. They don’t discuss implementing security, but if you can work out how to build a search engine that handles geography, this should be easy by comparison. They also cover how to implement specialty indexes for common features like type-ehead suggestion and spell check. If nothing else, I thought this chapter was an excellent resource for itemizing all the typical search UI features a business user might want.
Beyond technical problems, Relevant Search discusses the harder problems of search – how to work with other key teams in a business environment. To start out, they recommend creating “user personas”, to help envision how different people will navigate through a product. This is clearly a big deal with search applications, as many are aimed at either very sophisticated or unsophisticated uses, and tend not to scale well between these. There is a heavy focus on building very complex queries to solve different use cases – this was a surprise to me, as I assumed that “relevance” work just involved getting more data into the system.
Testing is also clearly very labor intensive – the authors recommend some strategies you can use, including an app called Quepid1 to help automate this. One of the challenges to tinkering with search results is quantifying how much effect a change has (especially in a product committee meeting). This obviously can’t be tested in the typical test-driven development style, so the Quepid product apparently works by grading the results.
Most custom search engines don’t have massive usage, but if you do get to that stage, they briefly discuss interesting options for going further. There is work ongoing to make Solr learn as it goes2.
The authors propose a great concept – replacing our concept of “search” with “recommendation”, and it seems there is a second Manning book in progress to follow this idea further – Practical Recommender Systems.