Solr: Query for the value closest to a user request

For a project I’m working on (searching for lectures), I thought it’d be nice to be able to search for lectures closest to a given value.

E.g.:

2 hour documentary
45 minutes on hadoop

The way I handle this is to pre-process queries before they go to Solr, so that I can change both the query and the sort order.

To pull this out, we need to inspect the query and extract any times. If they show up, we then need to convert them to the equivalent value in seconds:

const timeQueryRe = /(.*)(\b(\d+) (minute|minutes|hour|hours)\b)(.*)/i;

let matches = sentQuery.match(timeQueryRe);
const hourRe = /\bhour|hours\b/i;

if (!!matches) {
  const desiredLength = matches[3];
  const desiredAmount = matches[4];
  const factor = (!!desiredAmount.match(hourRe) ? 3600 : 60);  
  const perfectLength = parseInt(desiredLength) * factor;
}

Once we do that, we can use functions in the Solr sort parameter, to find the values closest to what the user requested (a minute longer is preferable to 15 minutes shorter):

qq.sort = 'abs(sub(audio_length_f,' + perfectLength + ')) asc';

It’s possible that “45 minutes” is actually the name of something, so we need to keep it in the query in case that’s what the person actually wanted. Unfortunately changing the sort order means that this will show up in a random spot, but this is better than nothing.

if (matches[1] != '' || matches[5] !== '') {
  query = ( 
    "((" + sentQuery + 
    ") OR (" + 
    matches[1] + " " + matches[5] + 
    "))" 
  );
} else {
  query = ( "*:*" );
}

An alternative approach is to create a field that has a token closest to a given length, e.g.: “5m, 10m, 15m, …”. By doing this, you could query for every possible length, but boost the ones that are close, e.g.:

length:5m^5 length:10m^10 length:15m^5

This would scale better than my approach, and work in conjuction with relevancy search, but is more complex to index.

Leave a Reply

Your email address will not be published. Required fields are marked *