Discussion:
Slow query: bitmap scan troubles
(too old to reply)
p***@foo.me.uk
2012-12-04 15:06:48 UTC
Permalink
Hi guys (and girls)

I've been banging my head over this for a few days now so if any of you kind
souls could take a minute to take a look at this I would be eternally
grateful.

I have a pretty straightforward query that is very slow by default, and
about 70 times faster when I set enable_bitmapscan=off. I would like to
convince the planner to use my lovely indexes.

The scenario is this; I have two tables, trade and position_effect. A trade
is a deal we do with somebody to exchange something for something else. It
has a time it was done, and is associated with a particular book for
accounting purposes. A position effect records changes to our position (e.g.
how much we have) of an particular asset. One trade can many position
effects (usually only 1,2 or 3)

For example, I do a trade of USD/GBP and I get two position effects, +1000
GBP and -1200USD


SCHEMA:
-------

The actual schema is a bit more complicated but I will put the important
parts here (if you think it important, the full schema for the two tables is
here: http://pastebin.com/6Y52aDFL):

CREATE TABLE trade
(
id bigserial NOT NULL,
time_executed timestamp with time zone NOT NULL,
id_book integer NOT NULL,
CONSTRAINT cons_trade_primary_key PRIMARY KEY (id),
)

CREATE INDEX idx_trade_id_book
ON trade
USING btree
(id_book, time_executed, id);

CREATE TABLE position_effect
(
id bigserial NOT NULL,
id_trade bigint NOT NULL,
id_asset integer NOT NULL,
quantity double precision NOT NULL,
CONSTRAINT cons_pe_primary_key PRIMARY KEY (id_trade, id_asset),
)

SETUP:
------

These tables are relatively large (~100 million rows in position effect).
The box is a pretty beastly affair with 512Mb of ram and 4x10 2Ghz cores.
The postgres configuration is here:

http://pastebin.com/48uyiak7

I am using a 64bit postgresql 9.2.1, hand compiled on a RedHat 6.2 box.

QUERY:
------

What I want to do is sum all of the position effects, for a particular asset
while joined to the trade table to filter for the time it was executed and
the book it was traded into:

SELECT sum(position_effect.quantity)
FROM trade, position_effect
WHERE trade.id = position_effect.id_trade
AND position_effect.id_asset = 1837
AND trade.time_executed >= '2012-10-28 00:00:00'
AND trade.id_book = 41

In this case there are only 11 rows that need to be summed. If I just let
postgres do its thing, that query takes 5000ms (Which when multiplied over
many books and assets gets very slow). I think this is because it is
bitmapping the whole position_effect table which is very large. If I disable
bitmap scans:

set enable_bitmapscan = off;

The query takes 43ms, and properly uses the indexes I have set up.

Slow version with bitmapscan enabled: http://explain.depesz.com/s/6I7
Fast version with bitmapscan disabled: http://explain.depesz.com/s/4MWG
--
Sent via pgsql-performance mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
p***@foo.me.uk
2012-12-04 15:21:17 UTC
Permalink
Bad form to reply to yourself I know but just check-reading that for the
third time I noticed two mistakes

- The box has 128Gb of ram, not 512Mb

- There is an additional constraint on the position_effect table (though I
don't think it matters for this discussion):
CONSTRAINT cons_pe_trade FOREIGN KEY (id_trade) REFERENCES trade (id)

Sorry to clog your inboxes further!

Regards,

Philip

-----Original Message-----
From: pgsql-performance-***@postgresql.org
[mailto:pgsql-performance-***@postgresql.org] On Behalf Of
***@foo.me.uk
Sent: 04 December 2012 15:07
To: pgsql-***@postgresql.org
Subject: [PERFORM] Slow query: bitmap scan troubles

Hi guys (and girls)

I've been banging my head over this for a few days now so if any of you kind
souls could take a minute to take a look at this I would be eternally
grateful.

I have a pretty straightforward query that is very slow by default, and
about 70 times faster when I set enable_bitmapscan=off. I would like to
convince the planner to use my lovely indexes.

The scenario is this; I have two tables, trade and position_effect. A trade
is a deal we do with somebody to exchange something for something else. It
has a time it was done, and is associated with a particular book for
accounting purposes. A position effect records changes to our position (e.g.
how much we have) of an particular asset. One trade can many position
effects (usually only 1,2 or 3)

For example, I do a trade of USD/GBP and I get two position effects, +1000
GBP and -1200USD


SCHEMA:
-------

The actual schema is a bit more complicated but I will put the important
parts here (if you think it important, the full schema for the two tables is
here: http://pastebin.com/6Y52aDFL):

CREATE TABLE trade
(
id bigserial NOT NULL,
time_executed timestamp with time zone NOT NULL,
id_book integer NOT NULL,
CONSTRAINT cons_trade_primary_key PRIMARY KEY (id),
)

CREATE INDEX idx_trade_id_book
ON trade
USING btree
(id_book, time_executed, id);

CREATE TABLE position_effect
(
id bigserial NOT NULL,
id_trade bigint NOT NULL,
id_asset integer NOT NULL,
quantity double precision NOT NULL,
CONSTRAINT cons_pe_primary_key PRIMARY KEY (id_trade, id_asset),
)

SETUP:
------

These tables are relatively large (~100 million rows in position effect).
The box is a pretty beastly affair with 512Mb of ram and 4x10 2Ghz cores.
The postgres configuration is here:

http://pastebin.com/48uyiak7

I am using a 64bit postgresql 9.2.1, hand compiled on a RedHat 6.2 box.

QUERY:
------

What I want to do is sum all of the position effects, for a particular asset
while joined to the trade table to filter for the time it was executed and
the book it was traded into:

SELECT sum(position_effect.quantity)
FROM trade, position_effect
WHERE trade.id = position_effect.id_trade
AND position_effect.id_asset = 1837
AND trade.time_executed >= '2012-10-28 00:00:00'
AND trade.id_book = 41

In this case there are only 11 rows that need to be summed. If I just let
postgres do its thing, that query takes 5000ms (Which when multiplied over
many books and assets gets very slow). I think this is because it is
bitmapping the whole position_effect table which is very large. If I disable
bitmap scans:

set enable_bitmapscan = off;

The query takes 43ms, and properly uses the indexes I have set up.

Slow version with bitmapscan enabled: http://explain.depesz.com/s/6I7 Fast
version with bitmapscan disabled: http://explain.depesz.com/s/4MWG




--
Sent via pgsql-performance mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
--
Sent via pgsql-performance mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
Claudio Freire
2012-12-04 15:27:57 UTC
Permalink
Post by p***@foo.me.uk
Slow version with bitmapscan enabled: http://explain.depesz.com/s/6I7
Fast version with bitmapscan disabled: http://explain.depesz.com/s/4MWG
If you check the "fast" plan, it has a higher cost compared against
the "slow" plan.

The difference between cost estimation and actual cost of your
queries, under relatively precise row estimates, seems to suggest your
e_c_s or r_p_c aren't a reflection of your hardware's performance.

First, make sure caching isn't interfering with your results. Run each
query several times.

Then, if the difference persists, you may have to tweak
effective_cache_size first, maybe random_page_cost too, to better
match your I/O subsystem's actual performance
--
Sent via pgsql-performance mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
Jeff Janes
2012-12-04 17:22:29 UTC
Permalink
Post by Claudio Freire
Post by p***@foo.me.uk
Slow version with bitmapscan enabled: http://explain.depesz.com/s/6I7
Fast version with bitmapscan disabled: http://explain.depesz.com/s/4MWG
If you check the "fast" plan, it has a higher cost compared against
the "slow" plan.
The difference between cost estimation and actual cost of your
queries, under relatively precise row estimates, seems to suggest your
e_c_s or r_p_c aren't a reflection of your hardware's performance.
But the row estimates are not precise at the top of the join/filter.
It thinks there will 2120 rows, but there are only 11.

So it seems like there is a negative correlation between the two
tables which is not recognized.
Post by Claudio Freire
First, make sure caching isn't interfering with your results. Run each
query several times.
If that is not how the production system works (running the same query
over and over) then you want to model the cold cache, not the hot one.
But in any case, the posted explains indicates that all buffers were
cached.

Cheers,

Jeff
--
Sent via pgsql-performance mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
Claudio Freire
2012-12-04 17:25:56 UTC
Permalink
Post by Jeff Janes
Post by Claudio Freire
Post by p***@foo.me.uk
Slow version with bitmapscan enabled: http://explain.depesz.com/s/6I7
Fast version with bitmapscan disabled: http://explain.depesz.com/s/4MWG
If you check the "fast" plan, it has a higher cost compared against
the "slow" plan.
The difference between cost estimation and actual cost of your
queries, under relatively precise row estimates, seems to suggest your
e_c_s or r_p_c aren't a reflection of your hardware's performance.
But the row estimates are not precise at the top of the join/filter.
It thinks there will 2120 rows, but there are only 11.
Ah... I didn't spot that one...
--
Sent via pgsql-performance mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
p***@foo.me.uk
2012-12-04 17:47:29 UTC
Permalink
Post by Claudio Freire
Post by Jeff Janes
But the row estimates are not precise at the top of the join/filter.
It thinks there will 2120 rows, but there are only 11.
Ah... I didn't spot that one...
Yes, you are right there - this is probably a slightly atypical query of
this sort actually, 2012 is a pretty good guess.

On Claudio's suggestion I have found lots more things to read up on and am
eagerly awaiting 6pm when I can bring the DB down and start tweaking. The
effective_work_mem setting is going from 6Gb->88Gb which I think will make
quite a difference.

I still can't quite wrap around my head why accessing an index is expected
to use more disk access than doing a bitmap scan of the table itself, but I
guess it does make a bit of sense if postgres assumes the table is more
likely to be cached.

It's all quite, quite fascinating :)

I'll let you know how it goes.

- Phil
--
Sent via pgsql-performance mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
Vitalii Tymchyshyn
2012-12-04 18:50:41 UTC
Permalink
Well, you don't need to put anything down. Most settings that change
planner decisions can be tuned on per-quey basis by issuing set commands in
given session. This should not affect other queries more than it is needed
to run query in the way planner chooses.

Best regards, Vitalii Tymchyshyn
Post by p***@foo.me.uk
Post by Claudio Freire
Post by Jeff Janes
But the row estimates are not precise at the top of the join/filter.
It thinks there will 2120 rows, but there are only 11.
Ah... I didn't spot that one...
Yes, you are right there - this is probably a slightly atypical query of
this sort actually, 2012 is a pretty good guess.
On Claudio's suggestion I have found lots more things to read up on and am
eagerly awaiting 6pm when I can bring the DB down and start tweaking. The
effective_work_mem setting is going from 6Gb->88Gb which I think will make
quite a difference.
I still can't quite wrap around my head why accessing an index is expected
to use more disk access than doing a bitmap scan of the table itself, but I
guess it does make a bit of sense if postgres assumes the table is more
likely to be cached.
It's all quite, quite fascinating :)
I'll let you know how it goes.
- Phil
--
http://www.postgresql.org/mailpref/pgsql-performance
--
Best regards,
Vitalii Tymchyshyn
p***@foo.me.uk
2012-12-04 18:56:04 UTC
Permalink
Ah okay, thanks. I knew I could set various things but not
effective_work_mem (I tried reloading the edited config file but it didn't
seem to pick it up)





From: Vitalii Tymchyshyn [mailto:***@gmail.com]
Sent: 04 December 2012 18:51
To: ***@foo.me.uk
Cc: postgres performance list
Subject: Re: [PERFORM] Slow query: bitmap scan troubles



Well, you don't need to put anything down. Most settings that change planner
decisions can be tuned on per-quey basis by issuing set commands in given
session. This should not affect other queries more than it is needed to run
query in the way planner chooses.



Best regards, Vitalii Tymchyshyn
Post by Claudio Freire
Post by Jeff Janes
But the row estimates are not precise at the top of the join/filter.
It thinks there will 2120 rows, but there are only 11.
Ah... I didn't spot that one...
Yes, you are right there - this is probably a slightly atypical query of
this sort actually, 2012 is a pretty good guess.

On Claudio's suggestion I have found lots more things to read up on and am
eagerly awaiting 6pm when I can bring the DB down and start tweaking. The
effective_work_mem setting is going from 6Gb->88Gb which I think will make
quite a difference.

I still can't quite wrap around my head why accessing an index is expected
to use more disk access than doing a bitmap scan of the table itself, but I
guess it does make a bit of sense if postgres assumes the table is more
likely to be cached.

It's all quite, quite fascinating :)

I'll let you know how it goes.

- Phil




--
Sent via pgsql-performance mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
--
Best regards,
Vitalii Tymchyshyn
Sergey Konoplev
2012-12-04 20:11:53 UTC
Permalink
Post by p***@foo.me.uk
eagerly awaiting 6pm when I can bring the DB down and start tweaking. The
effective_work_mem setting is going from 6Gb->88Gb which I think will make
quite a difference.
I also wonder if increasing (say x10) of default_statistics_target or
just doing ALTER TABLE SET STATISTICS for particular tables will help.
It will make planned to produce more precise estimations. Do not
forget ANALYZE afer changing it.
Post by p***@foo.me.uk
I still can't quite wrap around my head why accessing an index is expected
to use more disk access than doing a bitmap scan of the table itself, but I
guess it does make a bit of sense if postgres assumes the table is more
likely to be cached.
It's all quite, quite fascinating :)
I'll let you know how it goes.
- Phil
--
http://www.postgresql.org/mailpref/pgsql-performance
--
Sergey Konoplev
Database and Software Architect
http://www.linkedin.com/in/grayhemp

Phones:
USA +1 415 867 9984
Russia, Moscow +7 901 903 0499
Russia, Krasnodar +7 988 888 1979

Skype: gray-hemp
Jabber: ***@gmail.com
--
Sent via pgsql-performance mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
p***@foo.me.uk
2012-12-06 12:56:26 UTC
Permalink
I also wonder if increasing (say x10) of default_statistics_target or just
doing ALTER TABLE SET STATISTICS for particular tables will help.
It will make planned to produce more precise estimations. Do not forget
ANALYZE afer changing it.

Thanks Sergey, I will try this too.

I think the bother here is that this statistics are pretty good (we do
analyse regularly and default_statistics_target is already 1000), but once I
start filtering the two tables the correlations alter quite a bit. I don't
think there is that much that can be done about that :)

- Phil
--
Sent via pgsql-performance mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
Jeff Janes
2012-12-04 22:34:42 UTC
Permalink
Post by p***@foo.me.uk
Post by Claudio Freire
Post by Jeff Janes
But the row estimates are not precise at the top of the join/filter.
It thinks there will 2120 rows, but there are only 11.
Ah... I didn't spot that one...
Yes, you are right there - this is probably a slightly atypical query of
this sort actually, 2012 is a pretty good guess.
What do the timings look like on a more realistic example?
Post by p***@foo.me.uk
On Claudio's suggestion I have found lots more things to read up on and am
eagerly awaiting 6pm when I can bring the DB down and start tweaking. The
effective_work_mem setting is going from 6Gb->88Gb which I think will make
quite a difference.
You can change effective_cache_size just in your own session, or do it
globally with a "reload" or SIGHUP, no need to bring down the server.

However, I don't think it will make much difference. Even though it
thinks it is hitting the index 14,085 times, that is still small
compared to the overall size of the table.
Post by p***@foo.me.uk
I still can't quite wrap around my head why accessing an index is expected
to use more disk access than doing a bitmap scan of the table itself,
It is only doing an bitmap scan of those parts of the table which
contain relevant data, and it is doing them in physical order, so it
thinks that much of the IO which it thinks it is going to do is
largely sequential.
Post by p***@foo.me.uk
but I
guess it does make a bit of sense if postgres assumes the table is more
likely to be cached.
Unfortunately, postgres's planner doesn't know anything about that.
From your "explain" I can see in hindsight that everything you needed
was cached, but that is not information that the planner can use
(currently). And I don't know if *everything* is cached, or if just
those particular blocks are because you already ran the same query
with the same parameters recently.

Also, your work_mem is pretty low given the amount of RAM you have.

work_mem = 1MB

I don't think the current planner attempts to take account of the fact
that a bitmap scan which overflows work_mem and so becomes "lossy" is
quite a performance set-back. Nor does it look like explain analyze
informs you of this happening. But maybe I'm just looking in the
wrong places.

Cheers,

Jeff
--
Sent via pgsql-performance mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
p***@foo.me.uk
2012-12-04 18:03:29 UTC
Permalink
Post by Jeff Janes
But the row estimates are not precise at the top of the join/filter.
It thinks there will 2120 rows, but there are only 11.
So it seems like there is a negative correlation between the two tables
which is not recognized.

Yes, you are right there. I am only just beginning to understand how to
parse these explain reports.. As I mentioned above, I probably picked a bad
example to run that query on 11 is an unusually low number of results to get
back, a few thousand would be more normal.

Though that doesn't account for the 70x difference between the speed of the
two queries in actuality given a pretty similar expected speed (does it?).
It does go some way to explaining why a bad choice of plan was made.

Is there some nice bit of literature somewhere that explains what sort of
costs are associated with the different types of lookup? I have found bits
and bobs online but I still don't have a really clear idea in my head what
the difference is between a bitmap index scan and index only scan is, though
I can sort of guess I don't see why one would be considered more likely to
use the disk than the other.

On the 'slow' query (with the better predicted score)
Post by Jeff Janes
Post by Claudio Freire
First, make sure caching isn't interfering with your results. Run each
query several times.
If that is not how the production system works (running the same query
over and over) then you want to model the cold cache, not the hot one.
Post by Jeff Janes
But in any case, the posted explains indicates that all buffers were
cached.

We are in the rather pleasant situation here in that we are willing to spend
money on the box (up to a point, but quite a large point) to get it up to
the spec so that it should hardly ever need to touch the disk, the trick is
figuring out how to let our favourite database server know that.

I've just discovered pgtune and am having some fun with that too.

Cheers,

Phil
--
Sent via pgsql-performance mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
Claudio Freire
2012-12-04 18:31:50 UTC
Permalink
Post by p***@foo.me.uk
Though that doesn't account for the 70x difference between the speed of the
two queries in actuality given a pretty similar expected speed (does it?).
It does go some way to explaining why a bad choice of plan was made.
I still don't think it does. I still think the problem is the GUC settings.

The slow plan joins in a way that processes all 3M rows in both sides
of the join, and pg knows it.
The fast plan only processes 5k of them. And pg knows it. Why is it
choosing to process 3M rows?

If there's negative correlation, it only means less rows will be
produced, but the nested loop and and the right-hand index scan still
happens.
--
Sent via pgsql-performance mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
Jeff Janes
2012-12-04 23:42:21 UTC
Permalink
Post by p***@foo.me.uk
Though that doesn't account for the 70x difference between the speed of the
two queries in actuality given a pretty similar expected speed (does it?).
It kind of does. The expected speed is predicated on the number of
rows being 200 fold higher. If the number of rows actually was that
much higher, the two speeds might be closer together. That is why it
would be interesting to see a more typical case where the actual
number of rows is closer to the 2000 estimate.

But I am curious about how the cost estimate for the primary key look
up is arrived at:

Index Scan using cons_pe_primary_key on position_effect
(cost=0.00..42.96 rows=1 width=16)

There should be a random page for the index leaf page, and a random
page for the heap page. Since you set random_page_cost to 2, that
comes up to 4. Then there would be some almost negligible CPU costs.
Where the heck is the extra 38 cost coming from?
Post by p***@foo.me.uk
It does go some way to explaining why a bad choice of plan was made.
Is there some nice bit of literature somewhere that explains what sort of
costs are associated with the different types of lookup?
I've heard good things about Greg Smith's book, but I don't know if it
covers this particular thing.

Otherwise, I don't know of a good single place which is a tutorial
rather than a reference (or the code itself)
Post by p***@foo.me.uk
Post by Jeff Janes
Post by Claudio Freire
First, make sure caching isn't interfering with your results. Run each
query several times.
If that is not how the production system works (running the same query
over and over) then you want to model the cold cache, not the hot one.
Post by Jeff Janes
But in any case, the posted explains indicates that all buffers were
cached.
We are in the rather pleasant situation here in that we are willing to spend
money on the box (up to a point, but quite a large point) to get it up to
the spec so that it should hardly ever need to touch the disk, the trick is
figuring out how to let our favourite database server know that.
Well, that part is fairly easy. Make random_page_cost and
seq_page_cost much smaller than their defaults. Like, 0.04 and 0.03,
for example.

I think the *_page_cost should strictly an estimate of actually doing
IO, with a separate parameter to reflect likelihood of needing to do
the IO, like *_page_cachedness. But that isn't the way it is done
currently.

Cheers,

Jeff
--
Sent via pgsql-performance mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
Jeff Janes
2012-12-05 17:39:35 UTC
Permalink
On Tue, Dec 4, 2012 at 3:42 PM, Jeff Janes <***@gmail.com> wrote:

(Regarding http://explain.depesz.com/s/4MWG, wrote)
Post by Jeff Janes
But I am curious about how the cost estimate for the primary key look
Index Scan using cons_pe_primary_key on position_effect
(cost=0.00..42.96 rows=1 width=16)
There should be a random page for the index leaf page, and a random
page for the heap page. Since you set random_page_cost to 2, that
comes up to 4. Then there would be some almost negligible CPU costs.
Where the heck is the extra 38 cost coming from?
I now see where the cost is coming from. In commit 21a39de5809 (first
appearing in 9.2) the "fudge factor" cost estimate for large indexes
was increased by about 10 fold, which really hits this index hard.

This was fixed in commit bf01e34b556 "Tweak genericcostestimate's
fudge factor for index size", by changing it to use the log of the
index size. But that commit probably won't be shipped until 9.3.

I'm not sure that this change would fix your problem, because it might
also change the costs of the alternative plans in a way that
neutralizes things. But I suspect it would fix it. Of course, a
correct estimate of the join size would also fix it--you have kind of
a perfect storm here.

Cheers,

Jeff
--
Sent via pgsql-performance mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
Claudio Freire
2012-12-05 17:43:49 UTC
Permalink
Post by Jeff Janes
I'm not sure that this change would fix your problem, because it might
also change the costs of the alternative plans in a way that
neutralizes things. But I suspect it would fix it. Of course, a
correct estimate of the join size would also fix it--you have kind of
a perfect storm here.
As far as I can see on the explain, the misestimation is 3x~4x not 200x.
--
Sent via pgsql-performance mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
Jeff Janes
2012-12-06 17:27:48 UTC
Permalink
Post by Claudio Freire
Post by Jeff Janes
I'm not sure that this change would fix your problem, because it might
also change the costs of the alternative plans in a way that
neutralizes things. But I suspect it would fix it. Of course, a
correct estimate of the join size would also fix it--you have kind of
a perfect storm here.
As far as I can see on the explain, the misestimation is 3x~4x not 200x.
It is 3x (14085 vs 4588) for selectivity on one of the tables, "Index
Only Scan using idx_trade_id_book on trade".

But for the join of both tables it is estimate 2120 vs actual 11.

Cheers,

Jeff
--
Sent via pgsql-performance mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
Claudio Freire
2012-12-06 20:05:09 UTC
Permalink
Post by Jeff Janes
Post by Claudio Freire
Post by Jeff Janes
I'm not sure that this change would fix your problem, because it might
also change the costs of the alternative plans in a way that
neutralizes things. But I suspect it would fix it. Of course, a
correct estimate of the join size would also fix it--you have kind of
a perfect storm here.
As far as I can see on the explain, the misestimation is 3x~4x not 200x.
It is 3x (14085 vs 4588) for selectivity on one of the tables, "Index
Only Scan using idx_trade_id_book on trade".
But for the join of both tables it is estimate 2120 vs actual 11.
But the final result set isn't further worked on (except for the
aggregate), which means it doesn't affect the cost by much.
--
Sent via pgsql-performance mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
Jeff Janes
2012-12-06 21:09:54 UTC
Permalink
Post by Claudio Freire
Post by Jeff Janes
Post by Claudio Freire
As far as I can see on the explain, the misestimation is 3x~4x not 200x.
It is 3x (14085 vs 4588) for selectivity on one of the tables, "Index
Only Scan using idx_trade_id_book on trade".
But for the join of both tables it is estimate 2120 vs actual 11.
But the final result set isn't further worked on (except for the
aggregate), which means it doesn't affect the cost by much.
Good point. Both the NL and hash join do about the same amount of
work probing for success whether the success is actually there or not.

So scratch what I said about the correlation being important, in this
case it is not.

The 3x error is enough to push it over the edge, but the fudge factor
is what gets it so close to that edge in the first place.

And I'm now pretty sure the fudge factor change would fix this. The
truly-fast NL plan is getting overcharged by the fudge-factor once per
each 14,085 of the loopings, while the truly-slow bitmap scan is
overcharged only once for the entire scan. So the change is by no
means neutralized between the two plans.

I don't know if my other theory that the bitmap scan is overflowing
work_mem (but not costed for doing so) is also contributing.

Cheers,

Jeff
--
Sent via pgsql-performance mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
Tom Lane
2012-12-05 18:05:10 UTC
Permalink
Post by Jeff Janes
I now see where the cost is coming from. In commit 21a39de5809 (first
appearing in 9.2) the "fudge factor" cost estimate for large indexes
was increased by about 10 fold, which really hits this index hard.
This was fixed in commit bf01e34b556 "Tweak genericcostestimate's
fudge factor for index size", by changing it to use the log of the
index size. But that commit probably won't be shipped until 9.3.
Hm. To tell you the truth, in October I'd completely forgotten about
the January patch, and was thinking that the 1/10000 cost had a lot
of history behind it. But if we never shipped it before 9.2 then of
course that idea is false. Perhaps we should backpatch the log curve
into 9.2 --- that would reduce the amount of differential between what
9.2 does and what previous branches do for large indexes.

It would definitely be interesting to know if applying bf01e34b556
helps the OP's example.

regards, tom lane
--
Sent via pgsql-performance mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
p***@foo.me.uk
2012-12-06 12:52:07 UTC
Permalink
That is very interesting indeed, these indexes are quite large!

I will apply that patch and try it out this evening and let you know.

Thank you very much everyone for your time, the support has been amazing.

PS: Just looked at this thread on the archives page and realised I don't
have my name in FROM: field, which is a misconfiguration of my email client,
but figured I would leave it to prevent confusion, sorry about that.

All the best,

Philip Scott

-----Original Message-----
From: Tom Lane [mailto:***@sss.pgh.pa.us]
Sent: 05 December 2012 18:05
To: Jeff Janes
Cc: ***@foo.me.uk; postgres performance list
Subject: Re: [PERFORM] Slow query: bitmap scan troubles
Post by Jeff Janes
I now see where the cost is coming from. In commit 21a39de5809 (first
appearing in 9.2) the "fudge factor" cost estimate for large indexes
was increased by about 10 fold, which really hits this index hard.
This was fixed in commit bf01e34b556 "Tweak genericcostestimate's
fudge factor for index size", by changing it to use the log of the
index size. But that commit probably won't be shipped until 9.3.
Hm. To tell you the truth, in October I'd completely forgotten about the
January patch, and was thinking that the 1/10000 cost had a lot of history
behind it. But if we never shipped it before 9.2 then of course that idea
is false. Perhaps we should backpatch the log curve into 9.2 --- that would
reduce the amount of differential between what
9.2 does and what previous branches do for large indexes.

It would definitely be interesting to know if applying bf01e34b556 helps the
OP's example.

regards, tom lane
--
Sent via pgsql-performance mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
p***@foo.me.uk
2012-12-06 14:10:29 UTC
Permalink
Hi Jeff
It kind of does. The expected speed is predicated on the number of rows
being 200 fold higher. If the number of rows actually was that much higher,
the two speeds might be closer together. That is why it would be
interesting to see a more typical case where the actual number of rows is
closer to the 2000 estimate.

Ah, I see of course. Makes a lot of sense when you think about it. This has
been quite an enlightening adventure into the guts of postgres for me :)
But I am curious about how the cost estimate for the primary key look up
is arrived at:
( Delt with in your next reply, thanks for figuring that out! I will
certainly try the patch)
I've heard good things about Greg Smith's book, but I don't know if it
covers this particular thing.

A copy is on its way, thank you.
Post by p***@foo.me.uk
We are in the rather pleasant situation here in that we are willing to
spend money on the box (up to a point, but quite a large point) to get
it up to the spec so that it should hardly ever need to touch the
disk, the trick is figuring out how to let our favourite database server
know that.
Well, that part is fairly easy. Make random_page_cost and seq_page_cost
much smaller than their defaults. Like, 0.04 and 0.03, for example.

Yes, I have been playing a lot with that it makes a lot of difference. When
I tweak them down I end up getting a lot of nested loops instead of hash or
merge joins and they are much faster (presumably we might have gotten a
nested loop out of the planner if it could correctly estimate the low number
of rows returned).

I've got plenty of ammunition now to dig deeper, you guys have been
invaluable.

Cheers,

Phil
--
Sent via pgsql-performance mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
Guillaume Lelarge
2012-12-08 15:15:42 UTC
Permalink
Post by Jeff Janes
Post by p***@foo.me.uk
[...]
Is there some nice bit of literature somewhere that explains what sort of
costs are associated with the different types of lookup?
I've heard good things about Greg Smith's book, but I don't know if it
covers this particular thing.
Otherwise, I don't know of a good single place which is a tutorial
rather than a reference (or the code itself)
Greg's book is awesome. It really gives a lot of
informations/tips/whatever on performances. I mostly remember all the
informations about hardware, OS, PostgreSQL configuration, and such. Not
much on the EXPLAIN part.

On the EXPLAIN part, you may have better luck with some slides available
here and there.

Robert Haas gave a talk on the query planner at pgCon 2010. The audio
feed of Robert Haas talk is available with this file:
http://www.pgcon.org/2010/audio/15%20The%20PostgreSQL%20Query%
20Planner.mp3

You can also find the slides on
https://sites.google.com/site/robertmhaas/presentations

You can also read the "Explaining the Postgres Query Optimizer" talk
written by Bruce Momjian. It's available there :
http://momjian.us/main/presentations/internals.html

And finally, you can grab my slides over here:
http://www.dalibo.org/_media/understanding_explain.pdf. You have more
than slides. I tried to put a lot of informations in there.
--
Guillaume
http://blog.guillaume.lelarge.info
http://www.dalibo.com
--
Sent via pgsql-performance mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
p***@foo.me.uk
2012-12-10 09:52:42 UTC
Permalink
Greg's book is awesome. It really gives a lot of informations/tips/whatever on performances. I mostly remember all the informations about hardware, OS, PostgreSQL configuration, and such. Not much on the EXPLAIN part.
Arrived this morning :)
http://www.pgcon.org/2010/audio/15%20The%20PostgreSQL%20Query%
https://sites.google.com/site/robertmhaas/presentations
http://momjian.us/main/presentations/internals.html
http://www.dalibo.org/_media/understanding_explain.pdf
Well that is my evenings occupied for the next week. Thank you kindly.

- Phil
--
Sent via pgsql-performance mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
Claudio Freire
2012-12-04 18:32:57 UTC
Permalink
Is this really necessary?

(looks like a no-op, unless your CPU is slow)
--
Sent via pgsql-performance mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
p***@foo.me.uk
2012-12-04 18:54:29 UTC
Permalink
Ah, okay - my reasoning was there's a big fancy-pants raid array behind it
that makes disk operations faster relative to CPU ones.

I'll test it and see if it actually makes any difference.

-----Original Message-----
From: Claudio Freire [mailto:***@gmail.com]
Sent: 04 December 2012 18:33
To: Philip Scott
Cc: ***@foo.me.uk; postgres performance list
Subject: Re: [PERFORM] Slow query: bitmap scan troubles
Is this really necessary?

(looks like a no-op, unless your CPU is slow)
--
Sent via pgsql-performance mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
Kevin Grittner
2012-12-04 19:09:20 UTC
Permalink
Post by p***@foo.me.uk
Ah okay, thanks. I knew I could set various things but not
effective_work_mem (I tried reloading the edited config file but
it didn't seem to pick it up)
Check the server log, maybe there was a typo or capitalization
error.

To test on a single connection you should be able to just run:

SET effective_cache_size = '88GB';

By the way, one other setting that I have found a need to adjust to
get good plans is cpu_tuple_cost. In my experience, better plans
are chosen when this is in the 0.03 to 0.05 range than with the
default of 0.01.

-Kevin
--
Sent via pgsql-performance mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
Loading...