Fixing Issues Where Postgres Optimizer Ignores Full Text Indexes

When I attempted to implement faceted search against a Postgres full-text index, I found I had issues getting Postgres (9.2) to pick up the index:

CREATE INDEX search_idx2 ON data2 USING gin(to_tsvector('english', search));

SELECT * 
FROM (
  SELECT author, count(*) c
  FROM data2
  WHERE search @@ to_tsquery('linux')
  GROUP BY author
) counts 
ORDER BY c desc;

This query will do a full scan on the table (~9-12 minutes on a 1.1 million row table).

Sort 
(cost=233587.16..233587.17 rows=6 width=22)"
  Sort Key: (count(*))"
    HashAggregate  
    (cost=233586.96..233587.02 rows=6 width=14)"
      Seq Scan on data2 
     (cost=0.00..233578.12 rows=1768 width=14)"
        Filter: 
        1"

The fix is simply to apply the function used on the search column to the query, which cuts the time to ~23 seconds on a 1.1 million row table:

SELECT * 
FROM (
  SELECT author, count(*) c
  FROM data2
  WHERE to_tsvector('english', search) @@ to_tsquery('linux')
  GROUP BY author
) counts 
ORDER BY c desc;

And the resulting plan:

HashAggregate  
  (cost=74668.49..74669.28 rows=79 width=14) 
  (actual time=23726.071..23726.598 rows=1625 loops=1)
  Bitmap Heap Scan on data2  
    (cost=306.87..74535.03 rows=26692 width=14) 
    (actual time=15.051..23683.230 rows=27604 loops=1)

    Recheck Cond: (to_tsvector('english'::regconfig, (search)::text)
                  @@ to_tsquery('linux'::text))
    Rows Removed by Index Recheck: 42867
      Bitmap Index Scan on search_idx2  
      (cost=0.00..300.19 rows=26692 width=0) 
      (actual time=11.436..11.436 rows=27604 loops=1)
        Index Cond: (to_tsvector('english'::regconfig, (search)::text) 
                    @@ to_tsquery('linux'::text))
Total runtime: 23727.012 ms
  1. search)::text @@ to_tsquery('linux'::text []

Leave a Reply

Your email address will not be published. Required fields are marked *