Sorting by random values in Solr is an interesting concept. A few people have done this1 but I want to expand some more options here. First, there is a built-in random field2, which you can use to sort by.
If you just sort by random, you can do this:
This uses the built-in auto-fields, which pick up anything starting as random_ as a random number:
Unfortunately you can’t use this field with a “copyField” to put random numbers into your index, like you’d expect, so if you want these in the index itself, you’ll have to add it to your loading mechanism.
Sorting by a random field does not add data to the index either, it just uses the name to seed the random number. I tested this by sorting by a number of fields, and checking the index size over time. Every time you use this field name, you will get the same sort order. If you don’t want this, you’ll need to generate new names, e.g. if you use the current date, it will change the sort order once a day automatically.
If you prefer sort by relevancy score (or other attribute), and then sort by random to break ties, do this:
If you want to do something more complex, this usage can fall apart. For instance, say you want to keep Solr’s relevancy algorithm, but fuzz it a bit. At this writing there does not seem to be a random function available, so you may need to consider adding random numbers into your index.
You might be considering designing a system where each document has a topic, and you want to see roughly even numbers of documents matching each topic in the top of search results (all other things being equal).
To do this, you can create a field containing a randomized numeric value, call it “topic_boost”. For each topic, you choose a random number at index time, from 0 to the number of documents in that category – this ensures that until a topic runs out, search results sorted by this value would show an even amount of each topic.
If you are using the edismax or dismax parser, you can then add this random number to the score, like so:
When I was doing this, I found that it was easier to use negative values for the field I’m using, because otherwise it sorts in the wrong order. This also made it really easy to identify documents that didn’t have this treatment applied correctly, as any document with a positive number for score was missing the randomized fields.
The nice thing about this is that you can do this multiple times and add the results, or weight them to affect the balance of results you get back.