{"id":5022,"date":"2016-09-12T00:31:21","date_gmt":"2016-09-12T00:31:21","guid":{"rendered":"http:\/\/www.garysieling.com\/blog\/?p=5022"},"modified":"2016-09-12T00:31:21","modified_gmt":"2016-09-12T00:31:21","slug":"solr-query-value-closest-user-request","status":"publish","type":"post","link":"https:\/\/www.garysieling.com\/blog\/solr-query-value-closest-user-request\/","title":{"rendered":"Solr: Query for the value closest to a user request"},"content":{"rendered":"<p>For a project I&#8217;m working on (<a href=\"https:\/\/www.findlectures.com\/\">searching for lectures<\/a>), I thought it&#8217;d be nice to be able to search for lectures closest to a given value.<\/p>\n<p>E.g.:<\/p>\n<pre lang=\"javascript\">\n2 hour documentary\n45 minutes on hadoop\n<\/pre>\n<p>The way I handle this is to pre-process queries before they go to Solr, so that I can change both the query and the sort order.<\/p>\n<p>To pull this out, we need to inspect the query and extract any times. If they show up, we then need to convert them to the equivalent value in seconds:<\/p>\n<pre lang=\"javascript\">\nconst timeQueryRe = \/(.*)(\\b(\\d+) (minute|minutes|hour|hours)\\b)(.*)\/i;\n\nlet matches = sentQuery.match(timeQueryRe);\nconst hourRe = \/\\bhour|hours\\b\/i;\n\nif (!!matches) {\n  const desiredLength = matches[3];\n  const desiredAmount = matches[4];\n  const factor = (!!desiredAmount.match(hourRe) ? 3600 : 60);  \n  const perfectLength = parseInt(desiredLength) * factor;\n}\n<\/pre>\n<p>Once we do that, we can use <a href=\"https:\/\/www.garysieling.com\/blog\/list-solr-functions\">functions<\/a> in the Solr sort parameter, to find the values closest to what the user requested (a minute longer is preferable to 15 minutes shorter):<\/p>\n<pre lang=\"javascript\">\nqq.sort = 'abs(sub(audio_length_f,' + perfectLength + ')) asc';\n<\/pre>\n<p>It&#8217;s possible that &#8220;45 minutes&#8221; is actually the name of something, so we need to keep it in the query in case that&#8217;s what the person actually wanted. Unfortunately changing the sort order means that this will show up in a random spot, but this is better than nothing.<\/p>\n<pre lang=\"javascript\">\nif (matches[1] != '' || matches[5] !== '') {\n  query = ( \n    \"((\" + sentQuery + \n    \") OR (\" + \n    matches[1] + \" \" + matches[5] + \n    \"))\" \n  );\n} else {\n  query = ( \"*:*\" );\n}\n<\/pre>\n<p>An alternative approach is to create a field that has a token closest to a given length, e.g.: &#8220;5m, 10m, 15m, &#8230;&#8221;. By doing this, you could query for every possible length, but boost the ones that are close, e.g.:<\/p>\n<pre lang=\"javascript\">\nlength:5m^5 length:10m^10 length:15m^5\n<\/pre>\n<p>This would scale better than my approach, and work in conjuction with relevancy search, but is more complex to index.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>How to query solr for values closest to a given length<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"om_disable_all_campaigns":false,"_monsterinsights_skip_tracking":false,"_monsterinsights_sitenote_active":false,"_monsterinsights_sitenote_note":"","_monsterinsights_sitenote_category":0,"footnotes":""},"categories":[12],"tags":[517],"aioseo_notices":[],"amp_enabled":true,"_links":{"self":[{"href":"https:\/\/www.garysieling.com\/blog\/wp-json\/wp\/v2\/posts\/5022"}],"collection":[{"href":"https:\/\/www.garysieling.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.garysieling.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.garysieling.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.garysieling.com\/blog\/wp-json\/wp\/v2\/comments?post=5022"}],"version-history":[{"count":0,"href":"https:\/\/www.garysieling.com\/blog\/wp-json\/wp\/v2\/posts\/5022\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.garysieling.com\/blog\/wp-json\/wp\/v2\/media?parent=5022"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.garysieling.com\/blog\/wp-json\/wp\/v2\/categories?post=5022"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.garysieling.com\/blog\/wp-json\/wp\/v2\/tags?post=5022"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}