{"id":1918,"date":"2013-09-06T00:08:14","date_gmt":"2013-09-06T00:08:14","guid":{"rendered":"http:\/\/www.garysieling.com\/blog\/?p=1918"},"modified":"2013-09-06T00:08:14","modified_gmt":"2013-09-06T00:08:14","slug":"querying-multiple-postgres-full-text-indexes","status":"publish","type":"post","link":"https:\/\/www.garysieling.com\/blog\/querying-multiple-postgres-full-text-indexes\/","title":{"rendered":"Querying Multiple Postgres Full-Text Indexes"},"content":{"rendered":"<h3>Setup<\/h3>\n<p>I&#8217;ve set up several scenarios to see how Postgres handles querying multiple full-text indexes.<\/p>\n<p>To demonstrate these scenarios I set up the following table and indexes, which has 1.1 million records:<\/p>\n<pre lang=\"sql\">CREATE TABLE data2\n(\n  author character varying,\n  id character varying,\n  email character varying,\n  company character varying,\n  date character varying,\n  message character varying,\n  github character varying,\n  search character varying,\n  name character varying\n)\n\nCREATE INDEX author_idx ON data2 USING gin(to_tsvector('english', author));\nCREATE INDEX message_idx ON data2 USING gin(to_tsvector('english', message));\nCREATE INDEX company_idx ON data2 USING gin(to_tsvector('english', company));<\/pre>\n<h3>Querying across columns<\/h3>\n<p>You can easily query across columns, or combine multiple queries.<\/p>\n<pre lang=\"sql\">SELECT * \nFROM data2\nWHERE \n  to_tsvector('english', message) @@ to_tsquery('android')\n  and to_tsvector('english', message) @@ to_tsquery('merge')\n  and to_tsvector('english', author) @@ to_tsquery('mike')\n  and to_tsvector('english', company) @@ to_tsquery('google')<\/pre>\n<p>This structure triggers the use of a Bitmap indexes. I&#8217;ve been investigating whether it is possible to combine the index for several columns into one, but this presents an interesting case in favor of separate indexes.<\/p>\n<p>When this is used to query multiple columns, it&#8217;s super-fast- bitmap operations generate lists of bits, which are combined at a low-level in large blocks with AND and OR, in the same format as the AND and OR in the query.<\/p>\n<pre>Bitmap Heap Scan on data2  \n(cost=138.99..143.03 rows=1 width=881)\"\n  Recheck Cond: \n  ( (to_tsvector('english'::regconfig, (message)::text) \n   @@ to_tsquery('android'::text)) \n   AND (to_tsvector('english'::regconfig, (message)::text) \n   @@ to_tsquery('merge'::text)) \n   AND (to_tsvector('english'::regconfig, (company)::text) \n   @@ to_tsquery('google'::text)))\n  Filter:\n    (to_tsvector('english'::regconfig, (author)::text) \n    @@ to_tsquery('mike'::text))\n  BitmapAnd  \n  (cost=138.99..138.99 rows=1 width=0)\n    Bitmap Index Scan on message_idx  \n    (cost=0.00..36.45 rows=44 width=0)\n      Index Cond: \n        ( (to_tsvector('english'::regconfig, (message)::text) \n        @@ to_tsquery('android'::text)) \n        AND (to_tsvector('english'::regconfig, (message)::text) \n        @@ to_tsquery('merge'::text)))\n    Bitmap Index Scan on company_idx  \n    (cost=0.00..102.29 rows=8838 width=0)\n      Index Cond: \n        (to_tsvector('english'::regconfig, (company)::text)\n        @@ to_tsquery('google'::text))<\/pre>\n<p>Similar to the last query, we can see that ORs also use bitmap operations:<\/p>\n<pre lang=\"sql\">SELECT * \nFROM data2\nWHERE \n  to_tsvector('english', message) @@ to_tsquery('android')\n  and to_tsvector('english', message) @@ to_tsquery('merge')\n  and to_tsvector('english', author) @@ to_tsquery('mike')\n  and to_tsvector('english', company) @@ to_tsquery('google')<\/pre>\n<h3>Full Text and Joins<\/h3>\n<p>The next question is how this works when the columns queried are in separate tables &#8211; clearly this is harder to use with bitmap indexing, because for bitmap logical operators to work, the rows must be lined up between the two tables to execute the join. In Oracle, a scenario resembling this is actually possible with the star transform, but in Postgres our options are more limited.<\/p>\n<p>To support this, we create two new tables, using the github URL as a join key:<\/p>\n<pre lang=\"sql\">create table author_github as \nselect distinct author, github\nfrom data2;\n\ncreate table company_github as \nselect distinct company, github\nfrom data2;\n\ncreate index author_github_github_idx on author_github(github);\ncreate index company_github_github_idx on company_github(github);\n\ncreate index author_github_author_idx ON author_github USING gin(to_tsvector('english', author));\ncreate index company_github_company_idx ON company_github USING gin(to_tsvector('english', company));<\/pre>\n<p>Here, Postgres does all the work for each table, sorts the results, then does a merge join (a zipper algorithm). This isn&#8217;t quite as compelling as the bitmap combinations, but still effective, since the tables being queried are much smaller to start.<\/p>\n<pre lang=\"sql\">select author, company\nfrom author_github ag\njoin company_github cg on ag.github = cg.github\nwhere to_tsvector('english', author) @@ to_tsquery('linus')\n  and to_tsvector('english', company) @@ to_tsquery('google')<\/pre>\n<p>Note that the indexes on the IDs are not used at all for this query.<\/p>\n<pre>Merge Join  \n(cost=662.89..701.81 rows=2553 width=22)\n  Merge Cond: \n  ( (cg.github)::text = (ag.github)::text)\n    Sort  \n    (cost=212.16..212.48 rows=126 width=36)\n      Sort Key: cg.github\n        Bitmap Heap Scan on company_github cg  \n        (cost=12.98..207.77 rows=126 width=36)\n          Recheck Cond: \n          (to_tsvector('english'::regconfig, (company)::text) \n          @@ to_tsquery('google'::text))\n            Bitmap Index Scan on company_github_company_idx  \n            (cost=0.00..12.95 rows=126 width=0)\n             Index Cond: \n               (to_tsvector('english'::regconfig, (company)::text) \n               @@ to_tsquery('google'::text))\n  Sort  \n  (cost=450.72..451.38 rows=261 width=43)\n    Sort Key: ag.github\n      Bitmap Heap Scan on author_github ag  \n      (cost=14.03..440.25 rows=261 width=43)\n        Recheck Cond: \n        (to_tsvector('english'::regconfig, (author)::text) \n        @@ to_tsquery('linus'::text))\n          Bitmap Index Scan on author_github_author_idx  \n          (cost=0.00..13.96 rows=261 width=0)\n            Index Cond: \n              (to_tsvector('english'::regconfig, (author)::text) \n              @@ to_tsquery('linus'::text))<\/pre>\n<h3>Querying across columns through views<\/h3>\n<p>As one final test, lets prove that this still works when the query is against a view. Each column is renamed, just to show the functionality.<\/p>\n<pre lang=\"sql\">create view merge_commits as \nselect author a, id i, company c, email e, date d\nfrom data2\nwhere to_tsvector('english', message) @@ to_tsquery('merge')<\/pre>\n<p>Within the view we have one full-text condition, then the query against the view adds a second:<\/p>\n<pre lang=\"sql\">select * from merge_commits\nwhere to_tsvector('english', a) @@ to_tsquery('mike')<\/pre>\n<p>Like the original query, this generates Bitmap Ands, as expected.<\/p>\n<pre>Bitmap Heap Scan on data2  \n(cost=216.86..392.01 rows=44 width=163)\n  Recheck Cond: \n  ( (to_tsvector('english'::regconfig, (author)::text) \n  @@ to_tsquery('mike'::text)) \n  AND (to_tsvector('english'::regconfig, (message)::text) \n  @@ to_tsquery('merge'::text)))\n    BitmapAnd\n    (cost=216.86..216.86 rows=44 width=0)\n      Bitmap Index Scan on author_idx\n      (cost=0.00..106.29 rows=8838 width=0)\n        Index Cond: \n        (to_tsvector('english'::regconfig, (author)::text) \n         @@ to_tsquery('mike'::text))\n      Bitmap Index Scan on message_idx  \n        (cost=0.00..110.29 rows=8838 width=0)\n        Index Cond: \n          (to_tsvector('english'::regconfig, (message)::text) \n          @@ to_tsquery('merge'::text))<\/pre>\n<p>Once these tables are set up correctly, Postgres seems to be able to handle a variety of scenarios, the best being multiple filters on a single table.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Setup I&#8217;ve set up several scenarios to see how Postgres handles querying multiple full-text indexes. To demonstrate these scenarios I set up the following table and indexes, which has 1.1 million records: CREATE TABLE data2 ( author character varying, id character varying, email character varying, company character varying, date character varying, message character varying, github &hellip; <\/p>\n<p class=\"link-more\"><a href=\"https:\/\/www.garysieling.com\/blog\/querying-multiple-postgres-full-text-indexes\/\" class=\"more-link\">Continue reading<span class=\"screen-reader-text\"> &#8220;Querying Multiple Postgres Full-Text Indexes&#8221;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"om_disable_all_campaigns":false,"_monsterinsights_skip_tracking":false,"_monsterinsights_sitenote_active":false,"_monsterinsights_sitenote_note":"","_monsterinsights_sitenote_category":0,"footnotes":""},"categories":[7],"tags":[157,242,437,523],"aioseo_notices":[],"amp_enabled":true,"_links":{"self":[{"href":"https:\/\/www.garysieling.com\/blog\/wp-json\/wp\/v2\/posts\/1918"}],"collection":[{"href":"https:\/\/www.garysieling.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.garysieling.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.garysieling.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.garysieling.com\/blog\/wp-json\/wp\/v2\/comments?post=1918"}],"version-history":[{"count":0,"href":"https:\/\/www.garysieling.com\/blog\/wp-json\/wp\/v2\/posts\/1918\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.garysieling.com\/blog\/wp-json\/wp\/v2\/media?parent=1918"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.garysieling.com\/blog\/wp-json\/wp\/v2\/categories?post=1918"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.garysieling.com\/blog\/wp-json\/wp\/v2\/tags?post=1918"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}