Slow query when used in a view

Discussion:

(too old to reply)

Shaun Thomas

2013-03-11 22:29:13 UTC

Hey everyone!

A developer was complaining about a view he created to abstract an added
column in a left join. He was contemplating denormalizing the added
value into the parent table and using a trigger to maintain it instead,
and I obviously looked into the problem. I noticed the view was
incurring a sequence scan on an obvious index condition, but the regular
join version was not.

Curious, I whipped up this test case:

CREATE TABLE foo (id BIGINT, small_label VARCHAR);
INSERT INTO foo (id) VALUES (generate_series(1, 10000));
ALTER TABLE foo ADD CONSTRAINT pk_foo_id PRIMARY KEY (id);

CREATE TABLE bar (id BIGINT, foo_id BIGINT);

INSERT INTO bar (id, foo_id)
SELECT a, a%10000
FROM generate_series(1, 100000) a;

ALTER TABLE bar ADD CONSTRAINT pk_bar_id PRIMARY KEY (id);

CREATE TABLE tiny_foo (small_label VARCHAR NOT NULL PRIMARY KEY);
INSERT INTO tiny_foo (small_label)
VALUES (('yes', 'we', 'have', 'no', 'bananas'));

UPDATE foo SET small_label = 'bananas' WHERE id=750;

ANALYZE foo;
ANALYZE bar;
ANALYZE tiny_foo;

CREATE VIEW v_slow_view AS
SELECT foo.*, tf.small_label IS NOT NULL AS has_small_label
FROM foo
LEFT JOIN tiny_foo tf USING (small_label);

Now, this is with PostgreSQL 9.1.8, basically default everything in a
base Ubuntu install. So, the good query plan using all tables directly:

SELECT bar.*, foo.*, tf.small_label IS NOT NULL AS has_small_label
FROM bar
LEFT JOIN foo ON (foo.id = bar.foo_id)
LEFT JOIN tiny_foo tf USING (small_label)
WHERE bar.id IN (750, 1750, 2750)
ORDER BY bar.id;

does this:

Index Scan using pk_foo_id on foo (cost=0.00..8.27 rows=1 width=16)
Index Cond: (id = bar.foo_id)

The bad one using the view:

SELECT bar.*, sv.*
FROM bar
LEFT JOIN v_slow_view sv ON (sv.id = bar.foo_id)
WHERE bar.id IN (750, 1750, 2750)
ORDER BY bar.id;

Mysteriously, does this:

Seq Scan on foo (cost=0.00..145.00 rows=10000 width=16)

I'm... perplexed. This test case is way too shallow to be affected by
join_collapse_limit and its ilk, so I'm not sure what's going on here. I
sense an optimization fence, but I can't see where.

Thanks in advance!
--
Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604
312-676-8870
***@optionshouse.com

______________________________________________

See http://www.peak6.com/email_disclaimer/ for terms and conditions related to this email

--
Sent via pgsql-performance mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Tom Lane

2013-03-11 23:56:59 UTC

Permalink

Post by Shaun Thomas
A developer was complaining about a view he created to abstract an added
column in a left join. ...
CREATE VIEW v_slow_view AS
SELECT foo.*, tf.small_label IS NOT NULL AS has_small_label
FROM foo
LEFT JOIN tiny_foo tf USING (small_label);
SELECT bar.*, foo.*, tf.small_label IS NOT NULL AS has_small_label
FROM bar
LEFT JOIN foo ON (foo.id = bar.foo_id)
LEFT JOIN tiny_foo tf USING (small_label)
WHERE bar.id IN (750, 1750, 2750)
ORDER BY bar.id;
SELECT bar.*, sv.*
FROM bar
LEFT JOIN v_slow_view sv ON (sv.id = bar.foo_id)
WHERE bar.id IN (750, 1750, 2750)
ORDER BY bar.id;

These queries are not actually equivalent. In the first one, it is
impossible for "has_small_label" to read out as NULL: it will either be
true or false. However, in the second one, the IS NOT NULL is evaluated
below the LEFT JOIN to "sv", and therefore it is required that the query
return NULL for "has_small_label" in any row where bar.foo_id lacks a
join partner.

To implement that behavior correctly, we're forced to form the
foo-to-tiny_foo join first, then do the left join with bar (which'll
replace RHS columns by nulls where necessary).

And that means that you get the inefficient plan wherein the
foo-to-tiny_foo join is computed in its entirety.

9.2 does this case better, by virtue of the "parameterized plan" stuff,
which exists specifically to let us use nestloop-with-inner-indexscan
plans even when there are some join order restrictions complicating
matters.

regards, tom lane

--
Sent via pgsql-performance mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Shaun Thomas

2013-03-12 13:49:04 UTC

Permalink

Post by Tom Lane
And that means that you get the inefficient plan wherein the
foo-to-tiny_foo join is computed in its entirety.

:(

That's unfortunate, though I guess it makes sense. I moved the join in
the view into the SELECT clause as an EXISTS, and that seems to work
without major adverse effects. Apparently the tiny table really will be
tiny in actual use, so impact should be minimal.

I just really don't like using subselects that way. :)

Thanks, Tom!
--
Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604
312-676-8870
***@optionshouse.com

______________________________________________

See http://www.peak6.com/email_disclaimer/ for terms and conditions related to this email

--
Sent via pgsql-performance mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance