Discussion:
Surprising no use of indexes - low performance
(too old to reply)
Nicolas Charles
2013-02-14 15:35:12 UTC
Permalink
Hello,

I've been struggling to understand what's happening on my
databases/query for several days, and I'm turning to higher mind for a
logical answer.

I'm dealing with a fairly large database, containing logs informations,
that I crunch to get data out of it, with several indexes on them that I
hoped were logical

\d ruddersysevents
Table « public.ruddersysevents »
Colonne | Type |
Modificateurs
--------------------+--------------------------+--------------------------------------------------
id | integer | non NULL Par défaut,
nextval('serial'::regclass)
executiondate | timestamp with time zone | non NULL
nodeid | text | non NULL
directiveid | text | non NULL
ruleid | text | non NULL
serial | integer | non NULL
component | text | non NULL
keyvalue | text |
executiontimestamp | timestamp with time zone | non NULL
eventtype | character varying(64) |
policy | text |
msg | text |
Index :
"ruddersysevents_pkey" PRIMARY KEY, btree (id)
"component_idx" btree (component)
"configurationruleid_idx" btree (ruleid)
"executiontimestamp_idx" btree (executiontimestamp)
"keyvalue_idx" btree (keyvalue)
"nodeid_idx" btree (nodeid)
Contraintes de vérification :
"ruddersysevents_component_check" CHECK (component <> ''::text)
"ruddersysevents_configurationruleid_check" CHECK (ruleid <> ''::text)
"ruddersysevents_nodeid_check" CHECK (nodeid <> ''::text)
"ruddersysevents_policyinstanceid_check" CHECK (directiveid <>
''::text)


It contains 11018592 entries, with the followinf patterns :
108492 distinct executiontimestamp
14 distinct nodeid
59 distinct directiveid
26 distinct ruleid
35 distinct serial

Related table/index size are
relation | size
----------------------------------------+---------
public.ruddersysevents | 3190 MB
public.nodeid_idx | 614 MB
public.configurationruleid_idx | 592 MB
public.ruddersysevents_pkey | 236 MB
public.executiontimestamp_idx | 236 MB


I'm crunching the data by looking for each
nodeid/ruleid/directiveid/serial with an executiontimestamp in an interval:

explain analyze select executiondate, nodeid, ruleid, directiveid,
serial, component, keyValue, executionTimeStamp, eventtype, policy, msg
from RudderSysEvents where 1=1 and nodeId =
'31264061-5ecb-4891-9aa4-83824178f43d' and ruleId =
'61713ff1-aa6f-4c86-b3cb-7012bee707dd' and serial = 10 and
executiontimestamp between to_timestamp('2012-11-22 16:00:16.005',
'YYYY-MM-DD HH24:MI:SS.MS') and to_timestamp('2013-01-25 18:53:52.467',
'YYYY-MM-DD HH24:MI:SS.MS') ORDER BY executionTimeStamp asc;
Sort (cost=293125.41..293135.03 rows=3848 width=252) (actual
time=28628.922..28647.952 rows=62403 loops=1)
Sort Key: executiontimestamp
Sort Method: external merge Disk: 17480kB
-> Bitmap Heap Scan on ruddersysevents (cost=74359.66..292896.27
rows=3848 width=252) (actual time=1243.150..28338.927 rows=62403 loops=1)
Recheck Cond: ((nodeid =
'31264061-5ecb-4891-9aa4-83824178f43d'::text) AND (ruleid =
'61713ff1-aa6f-4c86-b3cb-7012bee707dd'::text))
Filter: ((serial = 10) AND (executiontimestamp >=
to_timestamp('2012-11-22 16:00:16.005'::text, 'YYYY-MM-DD
HH24:MI:SS.MS'::text)) AND (executiontimestamp <=
to_timestamp('2013-01-25 18:53:52.467'::text, 'YYYY-MM-DD
HH24:MI:SS.MS'::text)))
-> BitmapAnd (cost=74359.66..74359.66 rows=90079 width=0)
(actual time=1228.610..1228.610 rows=0 loops=1)
-> Bitmap Index Scan on nodeid_idx
(cost=0.00..25795.17 rows=716237 width=0) (actual time=421.365..421.365
rows=690503 loops=1)
Index Cond: (nodeid =
'31264061-5ecb-4891-9aa4-83824178f43d'::text)
-> Bitmap Index Scan on configurationruleid_idx
(cost=0.00..48562.32 rows=1386538 width=0) (actual time=794.490..794.490
rows=1381391 loops=1)
Index Cond: (ruleid =
'61713ff1-aa6f-4c86-b3cb-7012bee707dd'::text)
Total runtime: 28657.352 ms



I'm surprised that the executiontimestamp index is not used, since it
seems to be where most of the query time is spent.

For all my tests, I removed all the incoming logs, so that this table
has only selects and no writes

I'm using Postgres 8.4, on a quite smallish VM, with some process
runnings, with the following non default configuration
shared_buffers = 112MB
work_mem = 8MB
maintenance_work_mem = 48MB
max_stack_depth = 3MB
wal_buffers = 1MB
effective_cache_size = 128MB
checkpoint_segments = 6

Increasing the shared_buffers to 384, 1GB or 1500MB didn't improve the
performances (less than 10%). I would have expected it to improve, since
the indexes would all fit in RAM
Every explain analyze made after modification of configuration and
indexes where done after a complete batch of crunch of logs by our apps,
to be sure the stats were corrects

I tested further with the indexes, and reached these results :
1/ dropping the unique index "configurationruleid_idx" btree (ruleid),
"executiontimestamp_idx" btree (executiontimestamp), "keyvalue_idx"
btree (keyvalue), "nodeid_idx" btree (nodeid) to replace them by a
unique index did lower the perfs :

create index composite_idx on ruddersysevents (executiontimestamp,
ruleid, serial, nodeid);

Index Scan using composite_idx on ruddersysevents (cost=0.01..497350.22
rows=3729 width=252) (actual time=7.989..83717.901 rows=62403 loops=1)
Index Cond: ((executiontimestamp >= to_timestamp('2012-11-22
16:00:16.005'::text, 'YYYY-MM-DD HH24:MI:SS.MS'::text)) AND
(executiontimestamp <= to_timestamp('2013-01-25 18:53:52.467'::text,
'YYYY-MM-DD HH24:MI:SS.MS'::text)) AND (ruleid =
'61713ff1-aa6f-4c86-b3cb-7012bee707dd'::text) AND (serial = 10) AND
(nodeid = '31264061-5ecb-4891-9aa4-83824178f43d'::text))


2/ Removing nodeid from the index did lower again the perf

create index composite2_idx on ruddersysevents (executiontimestamp,
ruleid, serial);

Index Scan using composite2_idx on ruddersysevents
(cost=0.01..449948.98 rows=3729 width=252) (actual
time=23.507..84888.349 rows=62403 loops=1)
Index Cond: ((executiontimestamp >= to_timestamp('2012-11-22
16:00:16.005'::text, 'YYYY-MM-DD HH24:MI:SS.MS'::text)) AND
(executiontimestamp <= to_timestamp('2013-01-25 18:53:52.467'::text,
'YYYY-MM-DD HH24:MI:SS.MS'::text)) AND (ruleid =
'61713ff1-aa6f-4c86-b3cb-7012bee707dd'::text) AND (serial = 10))
Filter: (nodeid = '31264061-5ecb-4891-9aa4-83824178f43d'::text)

3/ Removing executiontimestamp from the composite index makes the query
performs better at the begining of its uses (around 17 secondes), but
over time it degrades (I'm logging query longer than 20 secondes, and
there are very rare in the first half of the batch, and getting more and
more common at the end) to what is below

create index composite3_idx on ruddersysevents (ruleid, serial, nodeid);
Sort (cost=24683.44..24693.07 rows=3849 width=252) (actual
time=60454.558..60474.013 rows=62403 loops=1)
Sort Key: executiontimestamp
Sort Method: external merge Disk: 17480kB
-> Bitmap Heap Scan on ruddersysevents (cost=450.12..24454.23
rows=3849 width=252) (actual time=146.065..60249.143 rows=62403 loops=1)
Recheck Cond: ((ruleid =
'61713ff1-aa6f-4c86-b3cb-7012bee707dd'::text) AND (serial = 10) AND
(nodeid = '31264061-5ecb-4891-9aa4-83824178f43d'::text))
Filter: ((executiontimestamp >= to_timestamp('2012-11-22
16:00:16.005'::text, 'YYYY-MM-DD HH24:MI:SS.MS'::text)) AND
(executiontimestamp <= to_timestamp('2013-01-25 18:53:52.467'::text,
'YYYY-MM-DD HH24:MI:SS.MS'::text)))
-> Bitmap Index Scan on composite3_idx (cost=0.00..449.15
rows=6635 width=0) (actual time=129.102..129.102 rows=62403 loops=1)
Index Cond: ((ruleid =
'61713ff1-aa6f-4c86-b3cb-7012bee707dd'::text) AND (serial = 10) AND
(nodeid = '31264061-5ecb-4891-9aa4-83824178f43d'::text))
Total runtime: 60484.022 ms


And with all these tests, the difference of the whole database
processing with each index time is within 10% error margin (2h10 to
2h25), even if most of the time is spent on the SQL query.

So my question is :
"Why *not* indexing the column which is not used makes the query slower
over time, while not slowing the application?"

If you have some clues, I'm really interested.

Best regards
Nicolas
--
Sent via pgsql-performance mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
Ireneusz Pluta
2013-02-14 18:25:54 UTC
Permalink
I'm crunching the data by looking for each nodeid/ruleid/directiveid/serial with an
explain analyze select executiondate, nodeid, ruleid, directiveid, serial, component, keyValue,
executionTimeStamp, eventtype, policy, msg from RudderSysEvents where 1=1 and nodeId =
'31264061-5ecb-4891-9aa4-83824178f43d' and ruleId = '61713ff1-aa6f-4c86-b3cb-7012bee707dd' and
serial = 10 and executiontimestamp between to_timestamp('2012-11-22 16:00:16.005', 'YYYY-MM-DD
HH24:MI:SS.MS') and to_timestamp('2013-01-25 18:53:52.467', 'YYYY-MM-DD HH24:MI:SS.MS') ORDER BY
executionTimeStamp asc;
Sort (cost=293125.41..293135.03 rows=3848 width=252) (actual time=28628.922..28647.952
rows=62403 loops=1)
Sort Key: executiontimestamp
Sort Method: external merge Disk: 17480kB
-> Bitmap Heap Scan on ruddersysevents (cost=74359.66..292896.27 rows=3848 width=252) (actual
time=1243.150..28338.927 rows=62403 loops=1)
Recheck Cond: ((nodeid = '31264061-5ecb-4891-9aa4-83824178f43d'::text) AND (ruleid =
'61713ff1-aa6f-4c86-b3cb-7012bee707dd'::text))
Filter: ((serial = 10) AND (executiontimestamp >= to_timestamp('2012-11-22
16:00:16.005'::text, 'YYYY-MM-DD HH24:MI:SS.MS'::text)) AND (executiontimestamp <=
to_timestamp('2013-01-25 18:53:52.467'::text, 'YYYY-MM-DD HH24:MI:SS.MS'::text)))
-> BitmapAnd (cost=74359.66..74359.66 rows=90079 width=0) (actual
time=1228.610..1228.610 rows=0 loops=1)
-> Bitmap Index Scan on nodeid_idx (cost=0.00..25795.17 rows=716237 width=0)
(actual time=421.365..421.365 rows=690503 loops=1)
Index Cond: (nodeid = '31264061-5ecb-4891-9aa4-83824178f43d'::text)
-> Bitmap Index Scan on configurationruleid_idx (cost=0.00..48562.32 rows=1386538
width=0) (actual time=794.490..794.490 rows=1381391 loops=1)
Index Cond: (ruleid = '61713ff1-aa6f-4c86-b3cb-7012bee707dd'::text)
Total runtime: 28657.352 ms
I'm surprised that the executiontimestamp index is not used, since it seems to be where most of
the query time is spent.
this use pattern is quite similar to the one I used to have problem with. The key problem here is
that planner wants to bitmapand on indexes that are spread on all the table, on all timestamp
values, regardless you are interested in only a narrow timestamp window, and is quite aggressive on
using bitmapscan feature. So the planner needs to be directed more precisely.

You could try the above again with:

SET enable_bitmapscan TO off ?

It helped in my case.

You may also try close the timestamp condition in a "preselecting" CTE, and doing the rest of finer
filtering outside of it, like:

with
p as (select * from RudderSysEvents where executiontimestamp between '2012-11-22 16:00:16.005' and
'2013-01-25 18:53:52.467')
select executiondate, nodeid, ruleid, directiveid, serial, component, keyValue, executionTimeStamp,
eventtype, policy, msg
from p
where nodeId = '31264061-5ecb-4891-9aa4-83824178f43d' and ruleId =
'61713ff1-aa6f-4c86-b3cb-7012bee707dd' and serial = 10

As a side note, I think that all your indexes, except the timestamp one, are unnecessary, because of
low distribution or their values, and, as you see, the confuse they make to the planner.

Eventually, you may use one of the columns as a second one to a two column index together with
timestamp, the one which may always be used for filtering and add its filtering inside the CTE part.

HTH,
Irek.
--
Sent via pgsql-performance mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
Jeff Janes
2013-02-14 19:27:33 UTC
Permalink
On Thu, Feb 14, 2013 at 7:35 AM, Nicolas Charles
Post by Nicolas Charles
108492 distinct executiontimestamp
14 distinct nodeid
59 distinct directiveid
26 distinct ruleid
35 distinct serial
How many entries fall within a typical query interval of executiontimestamp?

...
Post by Nicolas Charles
I'm surprised that the executiontimestamp index is not used, since it seems
to be where most of the query time is spent.
I do not draw that conclusion from your posted information. Can you
highlight the parts of it that lead you to this conclusion?
Post by Nicolas Charles
For all my tests, I removed all the incoming logs, so that this table has
only selects and no writes
I'm using Postgres 8.4, on a quite smallish VM, with some process runnings,
A lot of improvements have been made since 8.4 which would make this
kind of thing easier to figure out. What is smallish?
Post by Nicolas Charles
with the following non default configuration
shared_buffers = 112MB
work_mem = 8MB
maintenance_work_mem = 48MB
max_stack_depth = 3MB
wal_buffers = 1MB
effective_cache_size = 128MB
effective_cache_size seems small unless you expect to have a lot of
this type of query running simultaneously, assuming you have at least
4GB of RAM, which I'm guessing you do based on your next comments.
Post by Nicolas Charles
checkpoint_segments = 6
Increasing the shared_buffers to 384, 1GB or 1500MB didn't improve the
performances (less than 10%). I would have expected it to improve, since the
indexes would all fit in RAM
If the indexes fit in RAM, they fit in RAM. If anything, increasing
shared_buffers could make it harder to fit them entirely in RAM. If
your shared buffers undergo a lot of churn, then the OS cache and the
shared buffers tend to uselessly mirror each other, meaning there is
less space for non-redundant pages.
Post by Nicolas Charles
create index composite_idx on ruddersysevents (executiontimestamp, ruleid,
serial, nodeid);
I wouldn't expect this to work well for this particular query. Since
the leading column is used in a range test, the following columns
cannot be used efficiently in the index structure. You should put the
equality-tested columns at the front of the index and the range-tested
one at the end of it.
Post by Nicolas Charles
2/ Removing nodeid from the index did lower again the perf
create index composite2_idx on ruddersysevents (executiontimestamp, ruleid,
serial);
I doubt that 84888.349 vs 83717.901 is really a meaningful difference.
Post by Nicolas Charles
3/ Removing executiontimestamp from the composite index makes the query
performs better at the begining of its uses (around 17 secondes), but over
time it degrades (I'm logging query longer than 20 secondes, and there are
very rare in the first half of the batch, and getting more and more common
at the end) to what is below
If the batch processing adds data, it is not surprising the query
slows down. It looks like it is still faster at the end then the
previous two cases, right?
Post by Nicolas Charles
"Why *not* indexing the column which is not used makes the query slower over
time, while not slowing the application?"
I don't know what column you are referring to here. But it sounds
like you think that dropping the leading column from an index is a
minor change. It is not. It makes a fundamentally different index.

Cheers,

Jeff
--
Sent via pgsql-performance mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
Nicolas Charles
2013-02-15 09:00:24 UTC
Permalink
Post by Jeff Janes
On Thu, Feb 14, 2013 at 7:35 AM, Nicolas Charles
Post by Nicolas Charles
108492 distinct executiontimestamp
14 distinct nodeid
59 distinct directiveid
26 distinct ruleid
35 distinct serial
How many entries fall within a typical query interval of executiontimestamp?
Around 65 000 entries
.
Post by Jeff Janes
Post by Nicolas Charles
I'm surprised that the executiontimestamp index is not used, since it seems
to be where most of the query time is spent.
I do not draw that conclusion from your posted information. Can you
highlight the parts of it that lead you to this conclusion?
The index scan are on nodeid_idx and configurationruleid_idx, not on
executiontimestamp
Or am I misreading the output ?
Post by Jeff Janes
Post by Nicolas Charles
For all my tests, I removed all the incoming logs, so that this table has
only selects and no writes
I'm using Postgres 8.4, on a quite smallish VM, with some process runnings,
A lot of improvements have been made since 8.4 which would make this
kind of thing easier to figure out. What is smallish?
A VM with 1 core, 2 GB RAM, a single hard drive
Post by Jeff Janes
Post by Nicolas Charles
with the following non default configuration
shared_buffers = 112MB
work_mem = 8MB
maintenance_work_mem = 48MB
max_stack_depth = 3MB
wal_buffers = 1MB
effective_cache_size = 128MB
effective_cache_size seems small unless you expect to have a lot of
this type of query running simultaneously, assuming you have at least
4GB of RAM, which I'm guessing you do based on your next comments.
For the sake of the test, the VM got its memory increased, with no
significant changes
Post by Jeff Janes
Post by Nicolas Charles
checkpoint_segments = 6
Increasing the shared_buffers to 384, 1GB or 1500MB didn't improve the
performances (less than 10%). I would have expected it to improve, since the
indexes would all fit in RAM
If the indexes fit in RAM, they fit in RAM. If anything, increasing
shared_buffers could make it harder to fit them entirely in RAM. If
your shared buffers undergo a lot of churn, then the OS cache and the
shared buffers tend to uselessly mirror each other, meaning there is
less space for non-redundant pages.
Oh !
So I completely misunderstood the meaning of shared_buffer; I figured
that it was somehow the place where the data would be stored by postgres
(like indexes)
Post by Jeff Janes
Post by Nicolas Charles
create index composite_idx on ruddersysevents (executiontimestamp, ruleid,
serial, nodeid);
I wouldn't expect this to work well for this particular query. Since
the leading column is used in a range test, the following columns
cannot be used efficiently in the index structure. You should put the
equality-tested columns at the front of the index and the range-tested
one at the end of it.
Post by Nicolas Charles
2/ Removing nodeid from the index did lower again the perf
create index composite2_idx on ruddersysevents (executiontimestamp, ruleid,
serial);
I doubt that 84888.349 vs 83717.901 is really a meaningful difference.
Post by Nicolas Charles
3/ Removing executiontimestamp from the composite index makes the query
performs better at the begining of its uses (around 17 secondes), but over
time it degrades (I'm logging query longer than 20 secondes, and there are
very rare in the first half of the batch, and getting more and more common
at the end) to what is below
If the batch processing adds data, it is not surprising the query
slows down. It looks like it is still faster at the end then the
previous two cases, right?
Actually, the batch reads data from this table, and writes into another.
So the content of the table doesn't change at all
And yes, it is faster than the two previous case
Post by Jeff Janes
Post by Nicolas Charles
"Why *not* indexing the column which is not used makes the query slower over
time, while not slowing the application?"
I don't know what column you are referring to here. But it sounds
like you think that dropping the leading column from an index is a
minor change. It is not. It makes a fundamentally different index.
I was refering to the executionTimeStamp column. I know it is a HUGE
change, but it's clearly not behavin the way I thought
With your remark I understand a little better what is going on, and I
can test better what I'm doing.

Thank you !
Nicolas
--
Sent via pgsql-performance mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
Jeff Janes
2013-02-15 18:04:07 UTC
Permalink
On Fri, Feb 15, 2013 at 1:00 AM, Nicolas Charles
Post by Nicolas Charles
Post by Jeff Janes
On Thu, Feb 14, 2013 at 7:35 AM, Nicolas Charles
Post by Nicolas Charles
108492 distinct executiontimestamp
14 distinct nodeid
59 distinct directiveid
26 distinct ruleid
35 distinct serial
How many entries fall within a typical query interval of
executiontimestamp?
Around 65 000 entries
.
Post by Jeff Janes
Post by Nicolas Charles
I'm surprised that the executiontimestamp index is not used, since it seems
to be where most of the query time is spent.
I do not draw that conclusion from your posted information. Can you
highlight the parts of it that lead you to this conclusion?
The index scan are on nodeid_idx and configurationruleid_idx, not on
executiontimestamp
Or am I misreading the output ?
You are correct about the use of nodeid_idx and
configurationruleid_idx. But the part about most of the time going
into filtering out rows that fail to match executiontimestamp is the
part that I don't think is supported by the explained plan. The
information given by the explain plan is not sufficient to decide that
one way or the other. Some more recent improvements in "explain
(analyze, buffers)" would make it easier (especially with
track_io_timing on) but it would still be hard to draw a definitive
conclusion. The most definitive thing would be to do the experiment
of adding executiontimestamp as a *trailing* column to the end of one
of the existing indexes and see what happens.
Post by Nicolas Charles
Post by Jeff Janes
If the indexes fit in RAM, they fit in RAM. If anything, increasing
shared_buffers could make it harder to fit them entirely in RAM. If
your shared buffers undergo a lot of churn, then the OS cache and the
shared buffers tend to uselessly mirror each other, meaning there is
less space for non-redundant pages.
Oh !
So I completely misunderstood the meaning of shared_buffer; I figured that
it was somehow the place where the data would be stored by postgres (like
indexes)
That is correct, it is the space used by postgres to cache data. But,
the rest of RAM (beyond shared_buffers) will also be used to cache
data; but by the OS rather than by postgres. On a dedicated server,
the OS will generally choose to (or at least attempt to) use this
space for the benefit of postgres anyway. If shared_buffers > RAM/2,
it won't be very successful at this, but it will still try. The
kernel and postgres do not have intimate knowledge of each other, so
it is hard to arrange that all pages show up in just one place or the
other but not both.

Cheers,

Jeff
--
Sent via pgsql-performance mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
Nicolas Charles
2013-02-16 09:25:30 UTC
Permalink
Post by Jeff Janes
On Fri, Feb 15, 2013 at 1:00 AM, Nicolas Charles
Post by Nicolas Charles
Post by Jeff Janes
I'm surprised that the executiontimestamp index is not used, since it seems
to be where most of the query time is spent.
I do not draw that conclusion from your posted information. Can you
highlight the parts of it that lead you to this conclusion?
The index scan are on nodeid_idx and configurationruleid_idx, not on
executiontimestamp
Or am I misreading the output ?
You are correct about the use of nodeid_idx and
configurationruleid_idx. But the part about most of the time going
into filtering out rows that fail to match executiontimestamp is the
part that I don't think is supported by the explained plan. The
information given by the explain plan is not sufficient to decide that
one way or the other. Some more recent improvements in "explain
(analyze, buffers)" would make it easier (especially with
track_io_timing on) but it would still be hard to draw a definitive
conclusion. The most definitive thing would be to do the experiment
of adding executiontimestamp as a *trailing* column to the end of one
of the existing indexes and see what happens.
I added this index
"composite_idx" btree (ruleid, serial, executiontimestamp)
and lowered the shared_buffer to 54 MB.

For this specific query, it is indeed a bit better. (23s against 28s)

Sort (cost=43449.44..43459.07 rows=3854 width=252) (actual
time=23375.247..23394.704 rows=62403 loops=1)
Sort Key: executiontimestamp
Sort Method: external merge Disk: 17480kB
-> Bitmap Heap Scan on ruddersysevents (cost=28884.44..43219.89
rows=3854 width=252) (actual time=1528.704..23155.991 rows=62403 loops=1)
Recheck Cond: ((ruleid =
'61713ff1-aa6f-4c86-b3cb-7012bee707dd'::text) AND (serial = 10) AND
(executiontimestamp >= to_timestamp('2012-11-22 16:00:16.005'::text,
'YYYY-MM-DD HH24:MI:SS.MS'::text)) AND (executiontimestamp <=
to_timestamp('2013-01-25 18:53:52.467'::text, 'YYYY-MM-DD
HH24:MI:SS.MS'::text)) AND (nodeid =
'31264061-5ecb-4891-9aa4-83824178f43d'::text))
-> BitmapAnd (cost=28884.44..28884.44 rows=3854 width=0)
(actual time=1517.310..1517.310 rows=0 loops=1)
-> Bitmap Index Scan on composite_idx
(cost=0.00..3062.16 rows=59325 width=0) (actual time=775.418..775.418
rows=811614 loops=1)
Index Cond: ((ruleid =
'61713ff1-aa6f-4c86-b3cb-7012bee707dd'::text) AND (serial = 10) AND
(executiontimestamp >= to_timestamp('2012-11-22 16:00:16.005'::text,
'YYYY-MM-DD HH24:MI:SS.MS'::text)) AND (executiontimestamp <=
to_timestamp('2013-01-25 18:53:52.467'::text, 'YYYY-MM-DD
HH24:MI:SS.MS'::text)))
-> Bitmap Index Scan on nodeid_idx (cost=0.00..25820.11
rows=717428 width=0) (actual time=714.254..714.254 rows=690503 loops=1)
Index Cond: (nodeid =
'31264061-5ecb-4891-9aa4-83824178f43d'::text)
Total runtime: 23419.411 ms
(11 lignes)

But since there were a lot of Sort Method: external merge, and a lot of
object instanciations in our batch, we tried to split the job on 5 days
interval, so the average number of line returned is closed to 5000

Load/IO of the system was *significantly* lower during the batch
treatment with this index than with the previous attempt (load around .7
instead of 1.8)

And batch execution time is now 1h08 rather than 2h20
Post by Jeff Janes
Post by Nicolas Charles
Post by Jeff Janes
If the indexes fit in RAM, they fit in RAM. If anything, increasing
shared_buffers could make it harder to fit them entirely in RAM. If
your shared buffers undergo a lot of churn, then the OS cache and the
shared buffers tend to uselessly mirror each other, meaning there is
less space for non-redundant pages.
Oh !
So I completely misunderstood the meaning of shared_buffer; I figured that
it was somehow the place where the data would be stored by postgres (like
indexes)
That is correct, it is the space used by postgres to cache data. But,
the rest of RAM (beyond shared_buffers) will also be used to cache
data; but by the OS rather than by postgres. On a dedicated server,
the OS will generally choose to (or at least attempt to) use this
space for the benefit of postgres anyway. If shared_buffers > RAM/2,
it won't be very successful at this, but it will still try. The
kernel and postgres do not have intimate knowledge of each other, so
it is hard to arrange that all pages show up in just one place or the
other but not both.
Thank you very much for your explainations !

Nicolas
--
Sent via pgsql-performance mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
Merlin Moncure
2013-02-15 21:27:37 UTC
Permalink
On Thu, Feb 14, 2013 at 9:35 AM, Nicolas Charles
Post by Nicolas Charles
Hello,
I've been struggling to understand what's happening on my databases/query
for several days, and I'm turning to higher mind for a logical answer.
I'm dealing with a fairly large database, containing logs informations, that
I crunch to get data out of it, with several indexes on them that I hoped
were logical
\d ruddersysevents
Table « public.ruddersysevents »
Colonne | Type |
Modificateurs
--------------------+--------------------------+--------------------------------------------------
id | integer | non NULL Par défaut,
nextval('serial'::regclass)
executiondate | timestamp with time zone | non NULL
nodeid | text | non NULL
directiveid | text | non NULL
ruleid | text | non NULL
serial | integer | non NULL
component | text | non NULL
keyvalue | text |
executiontimestamp | timestamp with time zone | non NULL
eventtype | character varying(64) |
policy | text |
msg | text |
"ruddersysevents_pkey" PRIMARY KEY, btree (id)
"component_idx" btree (component)
"configurationruleid_idx" btree (ruleid)
"executiontimestamp_idx" btree (executiontimestamp)
"keyvalue_idx" btree (keyvalue)
"nodeid_idx" btree (nodeid)
"ruddersysevents_component_check" CHECK (component <> ''::text)
"ruddersysevents_configurationruleid_check" CHECK (ruleid <> ''::text)
"ruddersysevents_nodeid_check" CHECK (nodeid <> ''::text)
"ruddersysevents_policyinstanceid_check" CHECK (directiveid <> ''::text)
108492 distinct executiontimestamp
14 distinct nodeid
59 distinct directiveid
26 distinct ruleid
35 distinct serial
Related table/index size are
relation | size
----------------------------------------+---------
public.ruddersysevents | 3190 MB
public.nodeid_idx | 614 MB
public.configurationruleid_idx | 592 MB
public.ruddersysevents_pkey | 236 MB
public.executiontimestamp_idx | 236 MB
I'm crunching the data by looking for each nodeid/ruleid/directiveid/serial
explain analyze select executiondate, nodeid, ruleid, directiveid, serial,
component, keyValue, executionTimeStamp, eventtype, policy, msg from
RudderSysEvents where 1=1 and nodeId =
'31264061-5ecb-4891-9aa4-83824178f43d' and ruleId =
'61713ff1-aa6f-4c86-b3cb-7012bee707dd' and serial = 10 and
executiontimestamp between to_timestamp('2012-11-22 16:00:16.005',
'YYYY-MM-DD HH24:MI:SS.MS') and to_timestamp('2013-01-25 18:53:52.467',
'YYYY-MM-DD HH24:MI:SS.MS') ORDER BY executionTimeStamp asc;
Sort (cost=293125.41..293135.03 rows=3848 width=252) (actual
time=28628.922..28647.952 rows=62403 loops=1)
Sort Key: executiontimestamp
Sort Method: external merge Disk: 17480kB
-> Bitmap Heap Scan on ruddersysevents (cost=74359.66..292896.27
rows=3848 width=252) (actual time=1243.150..28338.927 rows=62403 loops=1)
Recheck Cond: ((nodeid =
'31264061-5ecb-4891-9aa4-83824178f43d'::text) AND (ruleid =
'61713ff1-aa6f-4c86-b3cb-7012bee707dd'::text))
Filter: ((serial = 10) AND (executiontimestamp >=
to_timestamp('2012-11-22 16:00:16.005'::text, 'YYYY-MM-DD
HH24:MI:SS.MS'::text)) AND (executiontimestamp <= to_timestamp('2013-01-25
18:53:52.467'::text, 'YYYY-MM-DD HH24:MI:SS.MS'::text)))
-> BitmapAnd (cost=74359.66..74359.66 rows=90079 width=0) (actual
time=1228.610..1228.610 rows=0 loops=1)
-> Bitmap Index Scan on nodeid_idx (cost=0.00..25795.17
rows=716237 width=0) (actual time=421.365..421.365 rows=690503 loops=1)
Index Cond: (nodeid =
'31264061-5ecb-4891-9aa4-83824178f43d'::text)
-> Bitmap Index Scan on configurationruleid_idx
(cost=0.00..48562.32 rows=1386538 width=0) (actual time=794.490..794.490
rows=1381391 loops=1)
Index Cond: (ruleid =
'61713ff1-aa6f-4c86-b3cb-7012bee707dd'::text)
Total runtime: 28657.352 ms
I'm surprised that the executiontimestamp index is not used, since it seems
to be where most of the query time is spent.
For all my tests, I removed all the incoming logs, so that this table has
only selects and no writes
I'm using Postgres 8.4, on a quite smallish VM, with some process runnings,
with the following non default configuration
shared_buffers = 112MB
work_mem = 8MB
maintenance_work_mem = 48MB
max_stack_depth = 3MB
wal_buffers = 1MB
effective_cache_size = 128MB
checkpoint_segments = 6
Increasing the shared_buffers to 384, 1GB or 1500MB didn't improve the
performances (less than 10%). I would have expected it to improve, since the
indexes would all fit in RAM
Every explain analyze made after modification of configuration and indexes
where done after a complete batch of crunch of logs by our apps, to be sure
the stats were corrects
1/ dropping the unique index "configurationruleid_idx" btree (ruleid),
"executiontimestamp_idx" btree (executiontimestamp), "keyvalue_idx" btree
(keyvalue), "nodeid_idx" btree (nodeid) to replace them by a unique index
create index composite_idx on ruddersysevents (executiontimestamp, ruleid,
serial, nodeid);
Index Scan using composite_idx on ruddersysevents (cost=0.01..497350.22
rows=3729 width=252) (actual time=7.989..83717.901 rows=62403 loops=1)
Index Cond: ((executiontimestamp >= to_timestamp('2012-11-22
16:00:16.005'::text, 'YYYY-MM-DD HH24:MI:SS.MS'::text)) AND
(executiontimestamp <= to_timestamp('2013-01-25 18:53:52.467'::text,
'YYYY-MM-DD HH24:MI:SS.MS'::text)) AND (ruleid =
'61713ff1-aa6f-4c86-b3cb-7012bee707dd'::text) AND (serial = 10) AND (nodeid
= '31264061-5ecb-4891-9aa4-83824178f43d'::text))
2/ Removing nodeid from the index did lower again the perf
create index composite2_idx on ruddersysevents (executiontimestamp, ruleid,
serial);
Index Scan using composite2_idx on ruddersysevents (cost=0.01..449948.98
rows=3729 width=252) (actual time=23.507..84888.349 rows=62403 loops=1)
Index Cond: ((executiontimestamp >= to_timestamp('2012-11-22
16:00:16.005'::text, 'YYYY-MM-DD HH24:MI:SS.MS'::text)) AND
(executiontimestamp <= to_timestamp('2013-01-25 18:53:52.467'::text,
'YYYY-MM-DD HH24:MI:SS.MS'::text)) AND (ruleid =
'61713ff1-aa6f-4c86-b3cb-7012bee707dd'::text) AND (serial = 10))
Filter: (nodeid = '31264061-5ecb-4891-9aa4-83824178f43d'::text)
3/ Removing executiontimestamp from the composite index makes the query
performs better at the begining of its uses (around 17 secondes), but over
time it degrades (I'm logging query longer than 20 secondes, and there are
very rare in the first half of the batch, and getting more and more common
at the end) to what is below
create index composite3_idx on ruddersysevents (ruleid, serial, nodeid);
Sort (cost=24683.44..24693.07 rows=3849 width=252) (actual
time=60454.558..60474.013 rows=62403 loops=1)
Sort Key: executiontimestamp
Sort Method: external merge Disk: 17480kB
-> Bitmap Heap Scan on ruddersysevents (cost=450.12..24454.23 rows=3849
width=252) (actual time=146.065..60249.143 rows=62403 loops=1)
Recheck Cond: ((ruleid =
'61713ff1-aa6f-4c86-b3cb-7012bee707dd'::text) AND (serial = 10) AND (nodeid
= '31264061-5ecb-4891-9aa4-83824178f43d'::text))
Filter: ((executiontimestamp >= to_timestamp('2012-11-22
16:00:16.005'::text, 'YYYY-MM-DD HH24:MI:SS.MS'::text)) AND
(executiontimestamp <= to_timestamp('2013-01-25 18:53:52.467'::text,
'YYYY-MM-DD HH24:MI:SS.MS'::text)))
-> Bitmap Index Scan on composite3_idx (cost=0.00..449.15
rows=6635 width=0) (actual time=129.102..129.102 rows=62403 loops=1)
Index Cond: ((ruleid =
'61713ff1-aa6f-4c86-b3cb-7012bee707dd'::text) AND (serial = 10) AND (nodeid
= '31264061-5ecb-4891-9aa4-83824178f43d'::text))
Total runtime: 60484.022 ms
And with all these tests, the difference of the whole database processing
with each index time is within 10% error margin (2h10 to 2h25), even if most
of the time is spent on the SQL query.
"Why *not* indexing the column which is not used makes the query slower over
time, while not slowing the application?"
If you have some clues, I'm really interested.
This query can be optimized to zero (assuming data in cache and not
too many dead rows).

create one index on nodeId, ruleId, serial, executiontimestamp columns and do:

SELECT
executiondate, nodeid, ruleid, directiveid, serial, component,
keyValue, executionTimeStamp, eventtype, policy, msg
FROM RudderSysEvents
WHERE (nodeId, ruleId, serial, executiontimestamp) >= (
'31264061-5ecb-4891-9aa4-83824178f43d',
'61713ff1-aa6f-4c86-b3cb-7012bee707dd',
10, to_timestamp('2012-11-22 16:00:16.005', 'YYYY-MM-DD HH24:MI:SS.MS'))
AND (nodeId, ruleId, serial, executiontimestamp) <= (
'31264061-5ecb-4891-9aa4-83824178f43d',
'61713ff1-aa6f-4c86-b3cb-7012bee707dd',
10, to_timestamp('2013-01-25 18:53:52.467', 'YYYY-MM-DD HH24:MI:SS.MS'))
ORDER BY nodeId, ruleId, serial, executiontimestamp

merlin
--
Sent via pgsql-performance mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
Loading...