Discussion:
FTS performance issue probably due to wrong planner estimate of detoasting
(too old to reply)
Stefan Keller
2013-02-08 00:52:46 UTC
Permalink
Hi,

I have problems with the performance of FTS in a query like this:

SELECT * FROM FullTextSearch WHERE content_tsv_gin @@
plainto_tsquery('english', 'good');

It's slow (> 30 sec.) for some GB (27886 html files, originally 73 MB zipped).
The planner obviously always chooses table scan: http://explain.depesz.com/s/EEE
I have to check again, if I'm doing something wrong but I'm pretty
sure it has to do with de-toasting and (wrong?) cost estimations.

I've seen some comments here saying that estimating detoasting costs
(especially with operator "@@" and GIN index) is an open issue (since
years?).
And I found a nice blog here [1] which uses 9.2/9.1 and proposes to
disable sequential table scan (SET enable_seqscan off;). But this is
no option for me since other queries still need seqscan.
Can anyone tell me if is on some agenda here (e.g. as an open item for >9.2)?

Yours, Stefan

[1] http://palominodb.com/blog/2012/03/06/considerations-about-text-searchs-big-fields-and-planner-costs
--
Sent via pgsql-performance mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
Pavel Stehule
2013-02-08 05:45:57 UTC
Permalink
Hello

you can try to wrap searching to immutable function and use following trick

http://postgres.cz/wiki/PostgreSQL_SQL_Tricks#Using_IMMUTABLE_functions_as_hints_for_the_optimizer

Regards

Pavel Stehule
Post by Stefan Keller
Hi,
plainto_tsquery('english', 'good');
It's slow (> 30 sec.) for some GB (27886 html files, originally 73 MB zipped).
The planner obviously always chooses table scan: http://explain.depesz.com/s/EEE
I have to check again, if I'm doing something wrong but I'm pretty
sure it has to do with de-toasting and (wrong?) cost estimations.
I've seen some comments here saying that estimating detoasting costs
years?).
And I found a nice blog here [1] which uses 9.2/9.1 and proposes to
disable sequential table scan (SET enable_seqscan off;). But this is
no option for me since other queries still need seqscan.
Can anyone tell me if is on some agenda here (e.g. as an open item for >9.2)?
Yours, Stefan
[1] http://palominodb.com/blog/2012/03/06/considerations-about-text-searchs-big-fields-and-planner-costs
--
http://www.postgresql.org/mailpref/pgsql-performance
--
Sent via pgsql-performance mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
Jesper Krogh
2013-02-08 06:56:03 UTC
Permalink
Post by Stefan Keller
Hi,
plainto_tsquery('english', 'good');
It's slow (> 30 sec.) for some GB (27886 html files, originally 73 MB zipped).
The planner obviously always chooses table scan: http://explain.depesz.com/s/EEE
I have to check again, if I'm doing something wrong but I'm pretty
sure it has to do with de-toasting and (wrong?) cost estimations.
If you havent done it .. bump up statistics target on the column and
re-analyze, see what that gives.

I have also been playing with the cost-numbers in order to get it to favour
an index-scan more often. That is lowering random_page_cost to be close to
seq_page_cost, dependent on your system, the amount of memory, etc, then
this can have negative side-effects on non-gin-queries.
--
Jesper
--
Sent via pgsql-performance mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
Stefan Keller
2013-02-08 17:30:06 UTC
Permalink
Hi Jesper and Pavel

Thx for your hints.
I'm rather reluctant in tuning with unwanted side effects, We'll see.
I have to setup my system and db again before I can try out your tricks.

Yours, Stefan
Post by Jesper Krogh
Post by Stefan Keller
Hi,
plainto_tsquery('english', 'good');
It's slow (> 30 sec.) for some GB (27886 html files, originally 73 MB zipped).
http://explain.depesz.com/s/EEE
I have to check again, if I'm doing something wrong but I'm pretty
sure it has to do with de-toasting and (wrong?) cost estimations.
If you havent done it .. bump up statistics target on the column and
re-analyze, see what that gives.
I have also been playing with the cost-numbers in order to get it to favour
an index-scan more often. That is lowering random_page_cost to be close to
seq_page_cost, dependent on your system, the amount of memory, etc, then
this can have negative side-effects on non-gin-queries.
--
Jesper
--
Sent via pgsql-performance mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
Loading...