Surprising no use of indexes

I'm crunching the data by looking for each nodeid/ruleid/directiveid/serial with an
explain analyze select executiondate, nodeid, ruleid, directiveid, serial, component, keyValue,
executionTimeStamp, eventtype, policy, msg from RudderSysEvents where 1=1 and nodeId =
'31264061-5ecb-4891-9aa4-83824178f43d' and ruleId = '61713ff1-aa6f-4c86-b3cb-7012bee707dd' and
serial = 10 and executiontimestamp between to_timestamp('2012-11-22 16:00:16.005', 'YYYY-MM-DD
HH24:MI:SS.MS') and to_timestamp('2013-01-25 18:53:52.467', 'YYYY-MM-DD HH24:MI:SS.MS') ORDER BY
executionTimeStamp asc;
Sort (cost=293125.41..293135.03 rows=3848 width=252) (actual time=28628.922..28647.952
rows=62403 loops=1)
Sort Key: executiontimestamp
Sort Method: external merge Disk: 17480kB
-> Bitmap Heap Scan on ruddersysevents (cost=74359.66..292896.27 rows=3848 width=252) (actual
time=1243.150..28338.927 rows=62403 loops=1)
Recheck Cond: ((nodeid = '31264061-5ecb-4891-9aa4-83824178f43d'::text) AND (ruleid =
'61713ff1-aa6f-4c86-b3cb-7012bee707dd'::text))
Filter: ((serial = 10) AND (executiontimestamp >= to_timestamp('2012-11-22
16:00:16.005'::text, 'YYYY-MM-DD HH24:MI:SS.MS'::text)) AND (executiontimestamp <=
to_timestamp('2013-01-25 18:53:52.467'::text, 'YYYY-MM-DD HH24:MI:SS.MS'::text)))
-> BitmapAnd (cost=74359.66..74359.66 rows=90079 width=0) (actual
time=1228.610..1228.610 rows=0 loops=1)
-> Bitmap Index Scan on nodeid_idx (cost=0.00..25795.17 rows=716237 width=0)
(actual time=421.365..421.365 rows=690503 loops=1)
Index Cond: (nodeid = '31264061-5ecb-4891-9aa4-83824178f43d'::text)
-> Bitmap Index Scan on configurationruleid_idx (cost=0.00..48562.32 rows=1386538
width=0) (actual time=794.490..794.490 rows=1381391 loops=1)
Index Cond: (ruleid = '61713ff1-aa6f-4c86-b3cb-7012bee707dd'::text)
Total runtime: 28657.352 ms
I'm surprised that the executiontimestamp index is not used, since it seems to be where most of
the query time is spent.

this use pattern is quite similar to the one I used to have problem with. The key problem here is
that planner wants to bitmapand on indexes that are spread on all the table, on all timestamp
values, regardless you are interested in only a narrow timestamp window, and is quite aggressive on
using bitmapscan feature. So the planner needs to be directed more precisely.

You could try the above again with:

SET enable_bitmapscan TO off ?

It helped in my case.

You may also try close the timestamp condition in a "preselecting" CTE, and doing the rest of finer
filtering outside of it, like:

with
p as (select * from RudderSysEvents where executiontimestamp between '2012-11-22 16:00:16.005' and
'2013-01-25 18:53:52.467')
select executiondate, nodeid, ruleid, directiveid, serial, component, keyValue, executionTimeStamp,
eventtype, policy, msg
from p
where nodeId = '31264061-5ecb-4891-9aa4-83824178f43d' and ruleId =
'61713ff1-aa6f-4c86-b3cb-7012bee707dd' and serial = 10

As a side note, I think that all your indexes, except the timestamp one, are unnecessary, because of
low distribution or their values, and, as you see, the confuse they make to the planner.

Eventually, you may use one of the columns as a second one to a two column index together with
timestamp, the one which may always be used for filtering and add its filtering inside the CTE part.

HTH,
Irek.

--
Sent via pgsql-performance mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Jeff Janes

2013-02-14 19:27:33 UTC

On Thu, Feb 14, 2013 at 7:35 AM, Nicolas Charles

Post by Nicolas Charles
108492 distinct executiontimestamp
14 distinct nodeid
59 distinct directiveid
26 distinct ruleid
35 distinct serial

How many entries fall within a typical query interval of executiontimestamp?

...

Post by Nicolas Charles
I'm surprised that the executiontimestamp index is not used, since it seems
to be where most of the query time is spent.

I do not draw that conclusion from your posted information. Can you
highlight the parts of it that lead you to this conclusion?

Post by Nicolas Charles
For all my tests, I removed all the incoming logs, so that this table has
only selects and no writes
I'm using Postgres 8.4, on a quite smallish VM, with some process runnings,

A lot of improvements have been made since 8.4 which would make this
kind of thing easier to figure out. What is smallish?

Post by Nicolas Charles
with the following non default configuration
shared_buffers = 112MB
work_mem = 8MB
maintenance_work_mem = 48MB
max_stack_depth = 3MB
wal_buffers = 1MB
effective_cache_size = 128MB

effective_cache_size seems small unless you expect to have a lot of
this type of query running simultaneously, assuming you have at least
4GB of RAM, which I'm guessing you do based on your next comments.

Post by Nicolas Charles
checkpoint_segments = 6
Increasing the shared_buffers to 384, 1GB or 1500MB didn't improve the
performances (less than 10%). I would have expected it to improve, since the
indexes would all fit in RAM

If the indexes fit in RAM, they fit in RAM. If anything, increasing
shared_buffers could make it harder to fit them entirely in RAM. If
your shared buffers undergo a lot of churn, then the OS cache and the
shared buffers tend to uselessly mirror each other, meaning there is
less space for non-redundant pages.

Post by Nicolas Charles
create index composite_idx on ruddersysevents (executiontimestamp, ruleid,
serial, nodeid);

I wouldn't expect this to work well for this particular query. Since
the leading column is used in a range test, the following columns
cannot be used efficiently in the index structure. You should put the
equality-tested columns at the front of the index and the range-tested
one at the end of it.

Post by Nicolas Charles
2/ Removing nodeid from the index did lower again the perf
create index composite2_idx on ruddersysevents (executiontimestamp, ruleid,
serial);

I doubt that 84888.349 vs 83717.901 is really a meaningful difference.

Post by Nicolas Charles
3/ Removing executiontimestamp from the composite index makes the query
performs better at the begining of its uses (around 17 secondes), but over
time it degrades (I'm logging query longer than 20 secondes, and there are
very rare in the first half of the batch, and getting more and more common
at the end) to what is below

If the batch processing adds data, it is not surprising the query
slows down. It looks like it is still faster at the end then the
previous two cases, right?

Post by Nicolas Charles
"Why *not* indexing the column which is not used makes the query slower over
time, while not slowing the application?"

I don't know what column you are referring to here. But it sounds
like you think that dropping the leading column from an index is a
minor change. It is not. It makes a fundamentally different index.

Cheers,

Jeff

--
Sent via pgsql-performance mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Nicolas Charles

2013-02-15 09:00:24 UTC

Post by Jeff Janes
On Thu, Feb 14, 2013 at 7:35 AM, Nicolas Charles

Post by Nicolas Charles
108492 distinct executiontimestamp
14 distinct nodeid
59 distinct directiveid
26 distinct ruleid
35 distinct serial

How many entries fall within a typical query interval of executiontimestamp?

Around 65 000 entries
.

Post by Nicolas Charles
I'm surprised that the executiontimestamp index is not used, since it seems
to be where most of the query time is spent.

I do not draw that conclusion from your posted information. Can you
highlight the parts of it that lead you to this conclusion?

The index scan are on nodeid_idx and configurationruleid_idx, not on
executiontimestamp
Or am I misreading the output ?

A lot of improvements have been made since 8.4 which would make this
kind of thing easier to figure out. What is smallish?

A VM with 1 core, 2 GB RAM, a single hard drive

For the sake of the test, the VM got its memory increased, with no
significant changes

Oh !
So I completely misunderstood the meaning of shared_buffer; I figured
that it was somehow the place where the data would be stored by postgres
(like indexes)

Post by Nicolas Charles
create index composite_idx on ruddersysevents (executiontimestamp, ruleid,
serial, nodeid);

Post by Nicolas Charles
2/ Removing nodeid from the index did lower again the perf
create index composite2_idx on ruddersysevents (executiontimestamp, ruleid,
serial);

I doubt that 84888.349 vs 83717.901 is really a meaningful difference.

If the batch processing adds data, it is not surprising the query
slows down. It looks like it is still faster at the end then the
previous two cases, right?

Actually, the batch reads data from this table, and writes into another.
So the content of the table doesn't change at all
And yes, it is faster than the two previous case

Post by Nicolas Charles
"Why *not* indexing the column which is not used makes the query slower over
time, while not slowing the application?"

I don't know what column you are referring to here. But it sounds
like you think that dropping the leading column from an index is a
minor change. It is not. It makes a fundamentally different index.

I was refering to the executionTimeStamp column. I know it is a HUGE
change, but it's clearly not behavin the way I thought
With your remark I understand a little better what is going on, and I
can test better what I'm doing.

Thank you !
Nicolas

--
Sent via pgsql-performance mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Jeff Janes

2013-02-15 18:04:07 UTC

On Fri, Feb 15, 2013 at 1:00 AM, Nicolas Charles

Post by Jeff Janes
On Thu, Feb 14, 2013 at 7:35 AM, Nicolas Charles

Post by Nicolas Charles
108492 distinct executiontimestamp
14 distinct nodeid
59 distinct directiveid
26 distinct ruleid
35 distinct serial

How many entries fall within a typical query interval of
executiontimestamp?

Around 65 000 entries
.

Post by Nicolas Charles
I'm surprised that the executiontimestamp index is not used, since it seems
to be where most of the query time is spent.

I do not draw that conclusion from your posted information. Can you
highlight the parts of it that lead you to this conclusion?

The index scan are on nodeid_idx and configurationruleid_idx, not on
executiontimestamp
Or am I misreading the output ?

You are correct about the use of nodeid_idx and
configurationruleid_idx. But the part about most of the time going
into filtering out rows that fail to match executiontimestamp is the
part that I don't think is supported by the explained plan. The
information given by the explain plan is not sufficient to decide that
one way or the other. Some more recent improvements in "explain
(analyze, buffers)" would make it easier (especially with
track_io_timing on) but it would still be hard to draw a definitive
conclusion. The most definitive thing would be to do the experiment
of adding executiontimestamp as a *trailing* column to the end of one
of the existing indexes and see what happens.

Post by Jeff Janes
If the indexes fit in RAM, they fit in RAM. If anything, increasing
shared_buffers could make it harder to fit them entirely in RAM. If
your shared buffers undergo a lot of churn, then the OS cache and the
shared buffers tend to uselessly mirror each other, meaning there is
less space for non-redundant pages.

Oh !
So I completely misunderstood the meaning of shared_buffer; I figured that
it was somehow the place where the data would be stored by postgres (like
indexes)

That is correct, it is the space used by postgres to cache data. But,
the rest of RAM (beyond shared_buffers) will also be used to cache
data; but by the OS rather than by postgres. On a dedicated server,
the OS will generally choose to (or at least attempt to) use this
space for the benefit of postgres anyway. If shared_buffers > RAM/2,
it won't be very successful at this, but it will still try. The
kernel and postgres do not have intimate knowledge of each other, so
it is hard to arrange that all pages show up in just one place or the
other but not both.

Cheers,

Jeff

--
Sent via pgsql-performance mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Nicolas Charles

2013-02-16 09:25:30 UTC

Post by Jeff Janes
On Fri, Feb 15, 2013 at 1:00 AM, Nicolas Charles

I'm surprised that the executiontimestamp index is not used, since it seems
to be where most of the query time is spent.

I do not draw that conclusion from your posted information. Can you
highlight the parts of it that lead you to this conclusion?

The index scan are on nodeid_idx and configurationruleid_idx, not on
executiontimestamp
Or am I misreading the output ?

I added this index
"composite_idx" btree (ruleid, serial, executiontimestamp)
and lowered the shared_buffer to 54 MB.

For this specific query, it is indeed a bit better. (23s against 28s)

Sort (cost=43449.44..43459.07 rows=3854 width=252) (actual
time=23375.247..23394.704 rows=62403 loops=1)
Sort Key: executiontimestamp
Sort Method: external merge Disk: 17480kB
-> Bitmap Heap Scan on ruddersysevents (cost=28884.44..43219.89
rows=3854 width=252) (actual time=1528.704..23155.991 rows=62403 loops=1)
Recheck Cond: ((ruleid =
'61713ff1-aa6f-4c86-b3cb-7012bee707dd'::text) AND (serial = 10) AND
(executiontimestamp >= to_timestamp('2012-11-22 16:00:16.005'::text,
'YYYY-MM-DD HH24:MI:SS.MS'::text)) AND (executiontimestamp <=
to_timestamp('2013-01-25 18:53:52.467'::text, 'YYYY-MM-DD
HH24:MI:SS.MS'::text)) AND (nodeid =
'31264061-5ecb-4891-9aa4-83824178f43d'::text))
-> BitmapAnd (cost=28884.44..28884.44 rows=3854 width=0)
(actual time=1517.310..1517.310 rows=0 loops=1)
-> Bitmap Index Scan on composite_idx
(cost=0.00..3062.16 rows=59325 width=0) (actual time=775.418..775.418
rows=811614 loops=1)
Index Cond: ((ruleid =
'61713ff1-aa6f-4c86-b3cb-7012bee707dd'::text) AND (serial = 10) AND
(executiontimestamp >= to_timestamp('2012-11-22 16:00:16.005'::text,
'YYYY-MM-DD HH24:MI:SS.MS'::text)) AND (executiontimestamp <=
to_timestamp('2013-01-25 18:53:52.467'::text, 'YYYY-MM-DD
HH24:MI:SS.MS'::text)))
-> Bitmap Index Scan on nodeid_idx (cost=0.00..25820.11
rows=717428 width=0) (actual time=714.254..714.254 rows=690503 loops=1)
Index Cond: (nodeid =
'31264061-5ecb-4891-9aa4-83824178f43d'::text)
Total runtime: 23419.411 ms
(11 lignes)

But since there were a lot of Sort Method: external merge, and a lot of
object instanciations in our batch, we tried to split the job on 5 days
interval, so the average number of line returned is closed to 5000

Load/IO of the system was *significantly* lower during the batch
treatment with this index than with the previous attempt (load around .7
instead of 1.8)

And batch execution time is now 1h08 rather than 2h20

Oh !
So I completely misunderstood the meaning of shared_buffer; I figured that
it was somehow the place where the data would be stored by postgres (like
indexes)

Thank you very much for your explainations !

Nicolas

--
Sent via pgsql-performance mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Merlin Moncure

2013-02-15 21:27:37 UTC