slow query plans caused by under-estimation of CTE cardinality

Discussion:

(too old to reply)

John Lumby

2013-02-18 16:45:01 UTC

On 2012-10-09 23:09:21

re subject Why am I getting great/terrible estimates with these CTE queries?
You're assuming the case where the estimate is better is better for a
reason ... but it's only better as a result of blind dumb luck. The
outer-level query planner doesn't know anything about the CTE's output
except the estimated number of rows --- in particular, it doesn't drill
down to find any statistics about the join column.

I am also struggling with a problem involving CTEs although in my case
it is caused by huge *under*-estimation of cardinality rather then *over*-estimation.
The statement is quite complex and the problem arises because there is a chain of
RECURSIVE CTEs each defined as a query involving an earlier CTE and more tables.
Eventually there is no hope for making a good cardinality estimate.

One CTE in particular has a cardinality estimate of 1 (I guess the actual
estimate is nearer zero and rounded up) but actual count is over 100000.
The planner puts this CTE as inner of a nested loop accessed by simple linear CTE scan
and the full query then takes over 20 minutes.

   -> Nested Loop (cost=0.00..0.06 rows=1 width=588) (actual time=2340.421..1201593.856 rows=105984 loops=1)
          Join Filter: ((winnum.subnet_id = binoptasc.subnet_id) AND (winnum.option_code = binoptasc.option_code) AND ((winnum.option_discriminator)::text = (binoptasc.option_discriminator)::text) AND (winnum.net_rel_level = binoptasc.net_rel_level))
          Rows Removed by Join Filter: 7001612448
          Buffers: shared hit=2290941
          -> CTE Scan on winning_option_nums winnum (cost=0.00..0.02 rows=1 width=536) (actual time=2338.422..2543.684 rows=62904 loops=1)
                Buffers: shared hit=2290941
          -> CTE Scan on subnet_inhrt_options_asc binoptasc (cost=0.00..0.02 rows=1 width=584) (actual time=0.000..9.728 rows=111308 loops=62904)

Whereas, (by altering various statistics to be very wrong) the entire query runs in 21 seconds.

There have been several debates about how to address situations like this where
no practical non-query-specific statistics-gathering scheme can ever hope to
gather enough statistics to model the later derived tables.     E.g. the frowned-on
SELECTIVITY clause and ideas for query-specific statistics.

Meanwhile,   I have one other suggestion aimed specifically at problematic CTEs:
Would it be reasonable to provide a new Planner Configuration option :

enable_nestloop_cte_inner (boolean)
Enables or disables the query planner's use of nested-loop join plans in which a CTE is the inner.
It is impossible to suppress such nested-loop joins entirely,
but turning this variable off discourages the planner from using one
if there are other methods available, such as sorting the CTE for merge-join
or hashing it for hash-join.
The default is on.

John

--
Sent via pgsql-performance mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Vitalii Tymchyshyn

2013-02-18 20:40:37 UTC

Permalink

Since cte is already an optimization fence, you can go further and make it
temporary table.
Create table;analyze;select should make optimizer's work much easier.

Post by John Lumby
On 2012-10-09 23:09:21

re subject Why am I getting great/terrible estimates with these CTE

queries?

You're assuming the case where the estimate is better is better for a
reason ... but it's only better as a result of blind dumb luck. The
outer-level query planner doesn't know anything about the CTE's output
except the estimated number of rows --- in particular, it doesn't drill
down to find any statistics about the join column.

I am also struggling with a problem involving CTEs although in my case
it is caused by huge *under*-estimation of cardinality rather then *over*-estimation.
The statement is quite complex and the problem arises because there is a chain of
RECURSIVE CTEs each defined as a query involving an earlier CTE and more tables.
Eventually there is no hope for making a good cardinality estimate.
One CTE in particular has a cardinality estimate of 1 (I guess the actual
estimate is nearer zero and rounded up) but actual count is over 100000.
The planner puts this CTE as inner of a nested loop accessed by simple linear CTE scan
and the full query then takes over 20 minutes.
-> Nested Loop (cost=0.00..0.06 rows=1 width=588) (actual
time=2340.421..1201593.856 rows=105984 loops=1)
Join Filter: ((winnum.subnet_id = binoptasc.subnet_id) AND
(winnum.option_code = binoptasc.option_code) AND
((winnum.option_discriminator)::text =
(binoptasc.option_discriminator)::text) AND (winnum.net_rel_level =
binoptasc.net_rel_level))
Rows Removed by Join Filter: 7001612448
Buffers: shared hit=2290941
-> CTE Scan on winning_option_nums winnum (cost=0.00..0.02
rows=1 width=536) (actual time=2338.422..2543.684 rows=62904 loops=1)
Buffers: shared hit=2290941
-> CTE Scan on subnet_inhrt_options_asc binoptasc
(cost=0.00..0.02 rows=1 width=584) (actual time=0.000..9.728 rows=111308
loops=62904)
Whereas, (by altering various statistics to be very wrong) the entire
query runs in 21 seconds.
There have been several debates about how to address situations like this where
no practical non-query-specific statistics-gathering scheme can ever hope to
gather enough statistics to model the later derived tables. E.g. the
frowned-on
SELECTIVITY clause and ideas for query-specific statistics.
Meanwhile, I have one other suggestion aimed specifically at problematic
enable_nestloop_cte_inner (boolean)
Enables or disables the query planner's use of nested-loop join plans in
which a CTE is the inner.
It is impossible to suppress such nested-loop joins entirely,
but turning this variable off discourages the planner from using one
if there are other methods available, such as sorting the CTE for
merge-join
or hashing it for hash-join.
The default is on.
John
--
http://www.postgresql.org/mailpref/pgsql-performance

John Lumby

2013-02-18 23:26:23 UTC

Permalink

Post by Vitalii Tymchyshyn
Since cte is already an optimization fence, you can go further and make
it temporary table.
Create table;analyze;select should make optimizer's work much easier.

Thanks Vitalii - yes,   you are right, and I have used that technique on other cases like this.

However, for this one,   the entire query must be executed as a single query in order
that it is based on a consistent snapshot (in the Multiversion Concurrency Control sense)
of the base table data.    Using the temp table technique would allow a commit to occur
which would be invisible to the part of the query which would build the temp
but visible to the remaining part of the query.     I know I could set Repeatable Read
for the transaction to ensure the consistency but that causes other concurrency problems
as this query is part of a fairly long-running transaction.     I really just want this one
query to avoid "dangerous" plans (meaning relying too much on an estimate of cardinality
of ~ 1 being really correct).

I also forgot to show the fragment of "good" plan (from corrupting the statistics).
It demonstrates how effective the hash join is in comparison -
20 minutes reduced down to 1 second for this join.

-> Hash Join (cost=0.80..1.51 rows=1 width=588) (actual time=1227.517..1693.792 rows=105984 loops=1)
       Hash Cond: ((winnum.subnet_id = binoptasc.subnet_id) AND (winnum.option_code = binoptasc.option_code) AND ((winnum.option_discriminator)::text = (binoptasc.option_discriminator)::text) AND (winnum.net_rel_level = binoptasc.net_rel_level))
       Buffers: shared hit=386485 read=364
       -> CTE Scan on winning_option_nums winnum (cost=0.00..0.40 rows=20 width=536) (actual time=1174.558..1222.542 rows=62904 loops=1)
             Buffers: shared hit=386485 read=364
       -> Hash (cost=0.40..0.40 rows=20 width=584) (actual time=52.933..52.933 rows=111308 loops=1)
             Buckets: 1024 Batches: 1 Memory Usage: 8644kB
             -> CTE Scan on subnet_inhrt_options_asc binoptasc (cost=0.00..0.40 rows=20 width=584) (actual time=0.001..21.651 rows=111308 loops=1)

John

--
Sent via pgsql-performance mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Tom Lane

2013-02-19 10:09:42 UTC

Permalink

Post by John Lumby
enable_nestloop_cte_inner (boolean)
Enables or disables the query planner's use of nested-loop join plans in which a CTE is the inner.

Sounds pretty badly thought out to me. There might be some cases where
this would help, but there would be many more where it would be useless
or counterproductive.

The case that was discussed in the previous thread looked like it could
be addressed by teaching the planner to drill down into CTEs to find
variable referents, as it already does for subquery RTEs (cf
examine_simple_variable in selfuncs.c). I'm not sure if your case is
similar or not --- you didn't provide any useful amount of detail.

regards, tom lane

--
Sent via pgsql-performance mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance