Discussion:
Why does the query planner use two full indexes, when a dedicated partial index exists?
(too old to reply)
Richard Neill
2012-12-19 21:13:06 UTC
Permalink
Dear All,

I've just joined this list, and I'd like to request some advice.

I have a table (1 GB in size) with 24 columns, and 5.6 million rows. Of
these, we're interested in two columns, parcel_id_code, and exit_state.

parcel_id_code has a fairly uniform distribution of integers
from 1-99999, it's never null.

exit_state has 3 possible values, 1,2 and null.
Almost all the rows are 1, about 0.1% have the value 2, and
only 153 rows are null


The query I'm trying to optimise looks like this:

SELECT * from tbl_tracker
WHERE parcel_id_code='53030' AND exit_state IS NULL;

So, I have a partial index:

"tbl_tracker_performance_1_idx" btree (parcel_id_code) WHERE
exit_state IS NULL

which works fine if it's the only index.


BUT, for other queries (unrelated to this question), I also have to have
full indexes on these columns:

"tbl_tracker_exit_state_idx" btree (exit_state)
"tbl_tracker_parcel_id_code_idx" btree (parcel_id_code)


The problem is, when I now run my query, the planner ignores the
dedicated index "tbl_tracker_performance_1_idx", and instead uses both
of the full indexes... resulting in a much much slower query (9ms vs
0.08ms).

A psql session is below. This shows that, if I force the planner to use
the partial index, by dropping the others, then it's fast. But as soon
as I put the full indexes back (which I need for other queries), the
query planner chooses them instead, and is slow.


Thanks very much for your help,

Richard










fsc_log => \d tbl_tracker

Column | Type | Modifiers
---------------------+--------------------------+------------------
id | bigint | not null default
nextval('master_id_seq'::regclass)
dreq_timestamp_1 | timestamp with time zone |
barcode_1 | character varying(13) |
barcode_2 | character varying(13) |
barcode_best | character varying(13) |
entrance_point | character varying(13) |
induct | character varying(5) |
entrance_state_x | integer |
dreq_count | integer |
parcel_id_code | integer |
host_id_code | bigint |
original_dest | integer |
drep_timestamp_n | timestamp with time zone |
actual_dest | integer |
exit_state | integer |
chute | integer |
original_dest_state | integer |
srep_timestamp | timestamp with time zone |
asn | character varying(9) |
is_asn_token | boolean |
track_state | integer |
warning | boolean |
Indexes:
"tbl_tracker_pkey" PRIMARY KEY, btree (id) CLUSTER
"tbl_tracker_barcode_best_idx" btree (barcode_best)
"tbl_tracker_chute_idx" btree (chute)
"tbl_tracker_drep_timestamp_n_idx" btree (drep_timestamp_n) WHERE
drep_timestamp_n IS NOT NULL
"tbl_tracker_dreq_timestamp_1_idx" btree (dreq_timestamp_1) WHERE
dreq_timestamp_1 IS NOT NULL
"tbl_tracker_exit_state_idx" btree (exit_state)
"tbl_tracker_parcel_id_code_idx" btree (parcel_id_code)
"tbl_tracker_performance_1_idx" btree (parcel_id_code) WHERE
exit_state IS NULL
"tbl_tracker_performance_2_idx" btree (host_id_code, id)
"tbl_tracker_performance_3_idx" btree (srep_timestamp) WHERE
exit_state = 1 AND srep_timestamp IS NOT NULL
"tbl_tracker_srep_timestamp_idx" btree (srep_timestamp) WHERE
srep_timestamp IS NOT NULL




fsc_log=> explain analyse select * from tbl_tracker where
parcel_id_code='53030' AND exit_state IS NULL;

QUERY PLAN
-----------------------------------------------------------------------
Bitmap Heap Scan on tbl_tracker (cost=8.32..10.84 rows=1 width=174)
(actual time=9.334..9.334 rows=0 loops=1)
Recheck Cond: ((parcel_id_code = 53030) AND (exit_state IS NULL))
-> BitmapAnd (cost=8.32..8.32 rows=1 width=0) (actual
time=9.329..9.329 rows=0 loops=1)
-> Bitmap Index Scan on tbl_tracker_parcel_id_code_idx
(cost=0.00..3.67 rows=57 width=0) (actual time=0.026..0.026 rows=65 loops=1)
Index Cond: (parcel_id_code = 53030)
-> Bitmap Index Scan on tbl_tracker_exit_state_idx
(cost=0.00..4.40 rows=150 width=0) (actual time=9.289..9.289 rows=93744
loops=1)
Index Cond: (exit_state IS NULL)
Total runtime: 9.366 ms
(8 rows)



fsc_log=> drop index tbl_tracker_exit_state_idx;
DROP INDEX

fsc_log=> explain analyse select * from tbl_tracker where
parcel_id_code='53030' AND exit_state IS NULL;

QUERY PLAN
----------------------------------------------------------------------------------------
Bitmap Heap Scan on tbl_tracker (cost=3.67..145.16 rows=1 width=174)
(actual time=0.646..0.646 rows=0 loops=1)
Recheck Cond: (parcel_id_code = 53030)
Filter: (exit_state IS NULL)
-> Bitmap Index Scan on tbl_tracker_parcel_id_code_idx
(cost=0.00..3.67 rows=57 width=0) (actual time=0.024..0.024 rows=65 loops=1)
Index Cond: (parcel_id_code = 53030)
Total runtime: 0.677 ms
(6 rows)




fsc_log=> drop index tbl_tracker_parcel_id_code_idx;
DROP INDEX

fsc_log=> explain analyse select * from tbl_tracker where
parcel_id_code='53030' AND exit_state IS NULL;

QUERY PLAN
--------------------------------------------------------------------------
Index Scan using tbl_tracker_performance_1_idx on tbl_tracker
(cost=0.00..5440.83 rows=1 width=174) (actual time=0.052..0.052 rows=0
loops=1)
Index Cond: (parcel_id_code = 53030)
Total runtime: 0.080 ms
(3 rows)



Server hardware: 8 core, 2.5 GHz, 24 GB, SSD in RAID-1.

Postgresql config (non-default):

version | PostgreSQL 9.1.6 on x86_64
checkpoint_segments | 128
client_encoding | UTF8
commit_delay | 50000
commit_siblings | 5
default_statistics_target | 5000
effective_cache_size | 12000MB
lc_collate | en_GB.UTF-8
lc_ctype | en_GB.UTF-8
log_line_prefix | %t
log_min_duration_statement | 50
maintenance_work_mem | 2GB
max_connections | 100
max_stack_depth | 4MB
port | 5432
random_page_cost | 2.5
server_encoding | UTF8
shared_buffers | 6000MB
ssl | on
standard_conforming_strings | off
temp_buffers | 128MB
TimeZone | GB
wal_buffers | 16MB
work_mem | 256MB
--
Sent via pgsql-performance mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
Sergey Konoplev
2012-12-19 22:59:39 UTC
Permalink
Hi,
Post by Richard Neill
Index Scan using tbl_tracker_performance_1_idx on tbl_tracker
(cost=0.00..5440.83 rows=1 width=174) (actual time=0.052..0.052 rows=0
loops=1)
Index Cond: (parcel_id_code = 53030)
It looks like your index is bloated. Have you had a lot of
updates/deletes on rows with exit_state is null?

Try to reindex tbl_tracker_performance_1_idx.

To reindex it without locks create a new index with temporary name
concurrently, delete the old one and rename the new one using the old
name.

--
Sergey Konoplev
Database and Software Architect
http://www.linkedin.com/in/grayhemp

Phones:
USA +1 415 867 9984
Russia, Moscow +7 901 903 0499
Russia, Krasnodar +7 988 888 1979

Skype: gray-hemp
Jabber: ***@gmail.com
--
Sent via pgsql-performance mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
Richard Neill
2012-12-19 23:49:34 UTC
Permalink
Post by Sergey Konoplev
Post by Richard Neill
Index Scan using tbl_tracker_performance_1_idx on tbl_tracker
(cost=0.00..5440.83 rows=1 width=174) (actual time=0.052..0.052 rows=0
loops=1)
Index Cond: (parcel_id_code = 53030)
It looks like your index is bloated. Have you had a lot of
updates/deletes on rows with exit_state is null?
Try to reindex tbl_tracker_performance_1_idx.
To reindex it without locks create a new index with temporary name
concurrently, delete the old one and rename the new one using the old
name.
Hi Sergey,

Thanks for your suggestion. Yes, I can see what you mean: over the 3
weeks during which we deployed the system, every single row has at one
point had the exit_state as null, before being updated.

Essentially, as time moves on, new rows are added, initially with
exit_state null, then a few minutes later we update them to exit_state
1, then a few weeks later, they are removed.

[Explanation: the system tracks books around a physical sortation
machine; the sorter uses a "parcel_id_code" which (for some really daft
reason suffers wraparound at 99999, i.e. about every 3 hours), books
whose exit_state is null are those which are still on the sortation
machine; once they exit, the state is either 1 (successful delivery) or
2 (collision, and down the dump chute).]

BUT....

* The reindex solution doesn't work. I just tried it, and the query
planner is still using the wrong indexes.

* If the tbl_tracker_performance_1_idx had indeed become bloated,
wouldn't that have meant that when the query planner was forced to use
it (by deleting the alternative indexes), it would have been slow?

Also, I thought that reindex wasn't supposed to be needed in normal
operation.

Best wishes,

Richard
--
Sent via pgsql-performance mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
Sergey Konoplev
2012-12-20 00:08:58 UTC
Permalink
* The reindex solution doesn't work. I just tried it, and the query planner
is still using the wrong indexes.
Can you show the explain analyze with tbl_tracker_performance_1_idx
straight after reindex (eg. before it has been bloated again)?
* If the tbl_tracker_performance_1_idx had indeed become bloated, wouldn't
that have meant that when the query planner was forced to use it (by
deleting the alternative indexes), it would have been slow?
It is hard to say. There might be a bloating threshold after with it
will be slow. Also it depends on the index column values.

--
Sergey Konoplev
Database and Software Architect
http://www.linkedin.com/in/grayhemp

Phones:
USA +1 415 867 9984
Russia, Moscow +7 901 903 0499
Russia, Krasnodar +7 988 888 1979

Skype: gray-hemp
Jabber: ***@gmail.com
--
Sent via pgsql-performance mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
Richard Neill
2012-12-20 00:22:49 UTC
Permalink
Thanks for your help,
Post by Sergey Konoplev
* The reindex solution doesn't work. I just tried it, and the query planner
is still using the wrong indexes.
Can you show the explain analyze with tbl_tracker_performance_1_idx
straight after reindex (eg. before it has been bloated again)?
Sure. Just done it now... the system has been fairly lightly loaded for
the last few hours - though I did have to change the specific number of
the parcel_id_code in the query.



fsc_log=> explain analyse select * from tbl_tracker where
parcel_id_code=92223 and exit_state is null;

QUERY PLAN
-----------------------------------------------------------
Index Scan using tbl_tracker_exit_state_idx on tbl_tracker
(cost=0.00..6.34 rows=1 width=174) (actual time=0.321..1.871 rows=1 loops=1)
Index Cond: (exit_state IS NULL)
Filter: (parcel_id_code = 92223)
Total runtime: 1.905 ms
(4 rows)



And now, force it, by dropping the other index (temporarily):

fsc_log=> drop index tbl_tracker_exit_state_idx;
DROP INDEX


fsc_log=> explain analyse select * from tbl_tracker where
parcel_id_code=92223 and exit_state is null;

QUERY PLAN
---------------------------------------------------------------------
Index Scan using tbl_tracker_performance_1_idx on tbl_tracker
(cost=0.00..7.78 rows=1 width=174) (actual time=0.040..0.041 rows=1 loops=1)
Index Cond: (parcel_id_code = 92223)
Total runtime: 0.077 ms
(3 rows)



As far as I can tell, the query planner really is just getting it wrong.

BTW, there is a significant effect on speed caused by running the same
query twice (it pulls stuff from disk into the OS disk-cache), but I've
already accounted for this.


Richard
--
Sent via pgsql-performance mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
Jeff Janes
2012-12-20 05:12:14 UTC
Permalink
Post by Richard Neill
Thanks for your help,
* The reindex solution doesn't work. I just tried it, and the query planner
is still using the wrong indexes.
It switched to a better one of the wrong indices, though, and got several
times faster.

How did it get so bloated in the first place? Is the table being updated
so rapidly that the statistics might be wrong even immediately after
analyze finishes?

In any case, I can't get it to prefer the full index in 9.1.6 at all. The
partial index wins hands down unless the table is physically clustered by
the parcel_id_code column. In which that case, the partial index wins by
only a little bit.

This is what I did for the table:

create table tbl_tracker as select case when random()<0.001 then 2 else
case when random()< 0.00003 then NULL else 1 end end as exit_state,
(random()*99999)::int as parcel_id_code from generate_series(1,5000000) ;

Cheers,

Jeff
Richard Neill
2012-12-20 05:57:14 UTC
Permalink
Dear Jeff,

Thanks for your help,
Post by Richard Neill
* The reindex solution doesn't work. I just tried it, and
the query planner
is still using the wrong indexes.
It switched to a better one of the wrong indices, though, and got
several times faster.
I think that this is a red herring. The switching between the two
"wrong" indices seems to be caused by non-uniformity in the
parcel_id_code: although it's distributed fairly well across 1-99999,
it's not perfect.

As for the speed-up, I think that's mostly caused by the fact that
running "Analyse" is pulling the entire table (and the relevant index)
into RAM and flushing other things out of that cache.
Post by Richard Neill
How did it get so bloated in the first place? Is the table being
updated so rapidly that the statistics might be wrong even immediately
after analyze finishes?
I don't think it is. We're doing about 10 inserts and 20 updates per
second on that table. But when I tested it, production had stopped for
the night - so the system was quiescent between the analyse and the select.
Post by Richard Neill
In any case, I can't get it to prefer the full index in 9.1.6 at all.
The partial index wins hands down unless the table is physically
clustered by the parcel_id_code column. In which that case, the partial
index wins by only a little bit.
Interesting that you should say that... the original setup script did
choose to cluster the table on that column.

Also, I wonder whether it matters which order the indexes are created in?


Best wishes,

Richard
--
Sent via pgsql-performance mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
Tom Lane
2012-12-20 15:43:26 UTC
Permalink
Post by Richard Neill
Also, I wonder whether it matters which order the indexes are created in?
IIRC, if the estimated costs of using two different indexes come out the
same (to within 1% or so), then the planner keeps the first-generated
path, which will result in preferring the index with smaller OID. This
effect doesn't apply to your problem query though, since we can see from
the drop-experiments that the estimated costs are quite a bit different.

A more likely explanation if you see some effect that looks like order
dependency is that the more recently created index has accumulated less
bloat, and thus has a perfectly justifiable cost advantage.

regards, tom lane
--
Sent via pgsql-performance mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
Tom Lane
2012-12-20 15:49:03 UTC
Permalink
Post by Jeff Janes
In any case, I can't get it to prefer the full index in 9.1.6 at all. The
partial index wins hands down unless the table is physically clustered by
the parcel_id_code column. In which that case, the partial index wins by
only a little bit.
create table tbl_tracker as select case when random()<0.001 then 2 else
case when random()< 0.00003 then NULL else 1 end end as exit_state,
(random()*99999)::int as parcel_id_code from generate_series(1,5000000) ;
What I did to try to duplicate Richard's situation was to create a test
table in which all the exit_state values were NULL, then build the
index, then UPDATE all but a small random fraction of the rows to 1,
then vacuum. This results in a rather bloated partial index, but I
think that's probably what he's got given that every record initially
enters the table with NULL exit_state. It would take extremely frequent
vacuuming to keep the partial index from accumulating a lot of dead
entries.

In this scenario, with 9.1, I got overly large estimates for the cost of
using the partial index; which matches up with his reports.

regards, tom lane
--
Sent via pgsql-performance mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
Jeff Janes
2012-12-21 03:40:11 UTC
Permalink
Post by Tom Lane
Post by Jeff Janes
In any case, I can't get it to prefer the full index in 9.1.6 at all.
The
Post by Jeff Janes
partial index wins hands down unless the table is physically clustered by
the parcel_id_code column. In which that case, the partial index wins by
only a little bit.
create table tbl_tracker as select case when random()<0.001 then 2 else
case when random()< 0.00003 then NULL else 1 end end as exit_state,
(random()*99999)::int as parcel_id_code from generate_series(1,5000000) ;
What I did to try to duplicate Richard's situation was to create a test
table in which all the exit_state values were NULL, then build the
index, then UPDATE all but a small random fraction of the rows to 1,
then vacuum. This results in a rather bloated partial index, but I
think that's probably what he's got given that every record initially
enters the table with NULL exit_state. It would take extremely frequent
vacuuming to keep the partial index from accumulating a lot of dead
entries.
I played with this scenario too, but still the only way I could get it to
spurn the partial index is if I rebuild the full one (to remove the bloat)
but not the partial one. But still, the cost were always in the 8 to 11
range for either index with default cost settings. It is hard to imagine
the amount of bloat needed to drive it up to 5000, like in his initial
report before he rebuilt it.

Cheers,

Jeff
Jeff Janes
2012-12-28 02:35:31 UTC
Permalink
Post by Tom Lane
What I did to try to duplicate Richard's situation was to create a test
table in which all the exit_state values were NULL, then build the
index, then UPDATE all but a small random fraction of the rows to 1,
then vacuum. This results in a rather bloated partial index, but I
think that's probably what he's got given that every record initially
enters the table with NULL exit_state. It would take extremely frequent
vacuuming to keep the partial index from accumulating a lot of dead
entries.
Once I cranked up default_statistics_target, I could start reproducing the
very high estimates (5000) for the partial index in 9.1.

As you say, switching to 9.2 or above lowers it quite a bit, I still get
some pretty high estimates, ~100 when 8 would be more accurate.

The problem is in genericcostestimate

if (index->pages > 1 && index->tuples > 1)
numIndexPages = ceil(numIndexTuples * index->pages / index->tuples);

The index->pages should probably not include index pages which are empty.
Even with aggressive vacuuming, most of the pages in the partial index
seem to be empty at any given time.

However, I don't know if that number is exposed readily. And it seems to
be updated too slowly to be useful, if pg_freespace is to be believed.

But I wonder if it couldn't be clamped to so that we there can be no more
pages than there are tuples.

numIndexPages = ceil(numIndexTuples * Min(1,index->pages /
index->tuples));


Cheers,

Jeff

Tom Lane
2012-12-20 03:06:30 UTC
Permalink
Post by Richard Neill
The problem is, when I now run my query, the planner ignores the
dedicated index "tbl_tracker_performance_1_idx", and instead uses both
of the full indexes... resulting in a much much slower query (9ms vs
0.08ms).
A psql session is below. This shows that, if I force the planner to use
the partial index, by dropping the others, then it's fast. But as soon
as I put the full indexes back (which I need for other queries), the
query planner chooses them instead, and is slow.
[ experiments with a similar test case ... ] I think the reason why the
planner is overestimating the cost of using the partial index is that
9.1 and earlier fail to account for the partial-index predicate when
estimating the number of index rows that will be visited. Because the
partial-index predicate is so highly selective in this case, that
results in a significant overestimate of how much of the index will be
traversed.

We fixed this for 9.2 in
http://git.postgresql.org/gitweb/?p=postgresql.git&a=commitdiff&h=21a39de5809cd3050a37d2554323cc1d0cbeed9d
but did not want to risk back-patching such a behavioral change. If
you're stuck on 9.1 you might want to think about applying that as a
local patch though.

(BTW, the "fudge factor" change in that patch has been criticized
recently; we've changed it again already for 9.3 and might choose to
back-patch that into 9.2.3. But it's the rest of it that you care about
anyway.)

regards, tom lane
--
Sent via pgsql-performance mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
Richard Neill
2012-12-20 05:51:57 UTC
Permalink
Dear Tom,

Thanks very much for your advice.
Post by Tom Lane
Post by Richard Neill
A psql session is below. This shows that, if I force the planner to use
the partial index, by dropping the others, then it's fast. But as soon
as I put the full indexes back (which I need for other queries), the
query planner chooses them instead, and is slow.
[ experiments with a similar test case ... ] I think the reason why the
planner is overestimating the cost of using the partial index is that
9.1 and earlier fail to account for the partial-index predicate when
estimating the number of index rows that will be visited. Because the
partial-index predicate is so highly selective in this case, that
results in a significant overestimate of how much of the index will be
traversed.
I think that seems likely to me.

I'll try out 9.2 and see if it helps. As it's a production server, I
have to wait for some downtime, probably Friday night before I can find
out - will report back.

Best wishes,

Richard
--
Sent via pgsql-performance mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
Richard Neill
2012-12-21 02:34:44 UTC
Permalink
Dear Tom,

Thanks againg for your help on this.
Post by Richard Neill
The problem is, when I now run my query, the planner ignores the
dedicated index "tbl_tracker_performance_1_idx", and instead uses both
of the full indexes... resulting in a much much slower query (9ms vs
0.08ms).
I've now installed 9.2. As you said, thanks to the change in 9.2 it
initially prefers the partial index.

BUT, after 1 cycle of inserting 500k rows, then deleting them all, then
starting to insert again, I find that the planner has reverted to the
former bad behaviour.

Reindexing only takes a couple of seconds, and restores correctness.

What's going on? Do I need to run reindex in a cron-job? I thought that
reindex wasn't "normally" needed, and that index bloat happened only
after every row had changed value hundreds of times.

Thanks,

Richard


---------------------
Here's the same session again.

[Please ignore the dreq_1_timestamp check - I mistakenly failed to
simplify it out of the query, and now that I reindexed, I can't redo the
experiment. I don't think it makes any difference.]


fsc_log=> explain analyse select * from tbl_tracker WHERE
parcel_id_code='90820' AND exit_state IS NULL AND (dreq_timestamp_1 >
timestamp '2012-12-20 13:02:36.652' - INTERVAL '36 hours');

QUERY PLAN
---------------------------------------------------------------
Bitmap Heap Scan on tbl_tracker (cost=17.35..19.86 rows=1 width=174)
(actual time=8.056..8.056 rows=0 loops=1)
Recheck Cond: ((exit_state IS NULL) AND (parcel_id_code = 90820))
Filter: (dreq_timestamp_1 > '2012-12-19 01:02:36.652'::timestamp
without time zone)
-> BitmapAnd (cost=17.35..17.35 rows=1 width=0) (actual
time=8.053..8.053 rows=0 loops=1)
-> Bitmap Index Scan on tbl_tracker_exit_state_idx
(cost=0.00..8.36 rows=151 width=0) (actual time=7.946..7.946 rows=20277
loops=1)
Index Cond: (exit_state IS NULL)
-> Bitmap Index Scan on tbl_tracker_parcel_id_code_idx
(cost=0.00..8.73 rows=58 width=0) (actual time=0.025..0.025 rows=72 loops=1)
Index Cond: (parcel_id_code = 90820)
Total runtime: 8.090 ms
(9 rows)


fsc_log=> REINDEX index tbl_tracker_performance_1_idx;
#This only took a couple of seconds to do.

fsc_log=> explain analyse select * from tbl_tracker WHERE
parcel_id_code='90820' AND exit_state IS NULL AND (dreq_timestamp_1 >
timestamp '2012-12-20 13:02:36.652' - INTERVAL '36 hours');

QUERY PLAN
---------------------------------------------------------------


Index Scan using tbl_tracker_performance_1_idx on tbl_tracker
(cost=0.00..5.27 rows=1 width=174) (actual time=0.019..0.019 rows=0 loops=1)
Index Cond: (parcel_id_code = 90820)
Filter: (dreq_timestamp_1 > '2012-12-19 01:02:36.652'::timestamp
without time zone)
Total runtime: 0.047 ms
(4 rows)
--
Sent via pgsql-performance mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
Richard Neill
2012-12-21 03:16:20 UTC
Permalink
Post by Richard Neill
Reindexing only takes a couple of seconds, and restores correctness.
Interestingly, the partial index (after reindexing) is only 16kB in
size; whereas the table is 1.1 GB, and the normal single-column indexes
are about 250MB in size.

In terms of what's physically happening in reality,

- tbl_tracker keeps a record of all books that move through the system
over a period of one month (at a rate of about 20/second, or 1
million/day), after which they are deleted.

- the partial index, tbl_tracker_performance_1_idx tracks only those
books which are currently "in flight" - books remain in flight for
about 200 seconds as they go round the machine.
(While in flight, these have exit_state = NULL)

- the partial index is used to overcome a design defect(*) in the
sorter machine, namely that it doesn't number each book uniquely,
but wraps the parcel_id_code every few hours. Worse, some books can
remain on the sorter for several days (if they jam), so the numbering
isn't a clean "wraparound", but more like a fragmented (and
occasionally lossy) filesystem.

- What I'm trying to do is trace the history of the books
through the system and assign each one a proper unique id.
So, if I see a book with "parcel_id_code = 37",
is it a new book (after pid wrap), or is it the same book I saw 1
minute ago, that hasn't exited the sorter?


So... is there some way to, for example, set a trigger that will reindex
every time the index exceeds 1000 rows?


Richard



(*)Readers of The Daily WTF might appreciate another curious anomaly:
this machine originally had an RS-232 port; it now uses ethernet, but
TxD and RxD use different TCP sockets on different network ports!
--
Sent via pgsql-performance mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
Jeff Janes
2012-12-21 05:15:17 UTC
Permalink
Post by Richard Neill
- What I'm trying to do is trace the history of the books
through the system and assign each one a proper unique id.
So, if I see a book with "parcel_id_code = 37",
is it a new book (after pid wrap), or is it the same book I saw 1
minute ago, that hasn't exited the sorter?
I'm not sure how you are implementing this goal, but I don't think it is
best done by looping over all books (presumably from some other table?) and
issuing an individual query for each one, if that is what you are doing.
Some kind of bulk join would probably be more efficient.

Cheers,

Jeff
Richard Neill
2012-12-22 17:46:30 UTC
Permalink
Post by Richard Neill
- What I'm trying to do is trace the history of the books
through the system and assign each one a proper unique id.
So, if I see a book with "parcel_id_code = 37",
is it a new book (after pid wrap), or is it the same book I saw 1
minute ago, that hasn't exited the sorter?
I'm not sure how you are implementing this goal, but I don't think it is
best done by looping over all books (presumably from some other table?)
and issuing an individual query for each one, if that is what you are
doing. Some kind of bulk join would probably be more efficient.
It would be nice to do a bulk join, but it's not possible: the query is
time sensitive. Consider:

id/pkey pid timestamp exit_state destination

1 77 -24 hours 1 212
2 77 -18 hours 1 213
3 77 -12 hours 1 45
4 77 -6 hours 1 443
5 77 0 hours null

[in future...]
5 77 0 hours 1 92
6 77 4 hours null


At time +5 minutes, I receive a report that a book with parcel_id 77 has
successfully been delivered to destination 92. So, what I have to do is:

* First, find the id of the most recent book which had pid=77 and where
the exit state is null. (hopefully, but not always, this yields exactly
one row, which in this case is id=5)

* Then update the table to set the destination to 92, where the id=5.


It's a rather cursed query, because:
- the sorter machine doesn't give me full info in each message, only
deltas, and I have to reconstruct the global state.
- pids are reused within hours, but don't increase monotonically,
(more like drawing repeatedly from a shuffled deck, where cards
are only returned to the deck sporadically.
- some pids get double-reported
- 1% of books fall off the machine, or get stuck on it.
- occasionally, messages are lost.
- the sorter state isn't self-consistent (it can be restarted)


The tracker table is my attempt to consistently combine all the state we
know, and merge in the deltas as we receive messages from the sorter
machine. It ends up reflecting reality about 99% of the time.


Richard
--
Sent via pgsql-performance mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
Richard Neill
2012-12-24 18:37:11 UTC
Permalink
Dear All,

I think periodic reindex may be the solution. Even after reducing the
autovacuum fraction to 0.05, the index still seems to bloat.

After another couple of days runtime, the index is using 11MB, and
I get a query that takes 2.448ms. Then I reindex (takes about 3 sec),
and the index falls to 16kB, and the query takes 0.035ms.

So... problem solved for me: I just have to reindex every few hours.
BUT, this suggests a few remaining things:


1. The documentation still suggests that reindex should not be needed in
"normal" operation... is this true? Or are the docs wrong? Or have I
got such an edge case? Does this suggest that an auto-reindexer would be
a useful feature?


2. Is there any way to force the planner to use (or ignore) a specific
index, for testing purposes, short of actually dropping the index?
This would be very useful for debugging, especially given that query
plans can only really be fully tested on production systems, and that
dropping indexes is rather a bad thing to do when live operation is
simultaneously happening on that server!

Thanks again for your help.

Best wishes,

Richard





fsc_log=> explain analyse select * from tbl_tracker WHERE
parcel_id_code='32453' AND exit_state IS NULL;

QUERY PLAN
------------------------------------------------------
Bitmap Heap Scan on tbl_tracker (cost=20.81..23.32 rows=1 width=174)
(actual time=2.408..2.408 rows=1 loops=1)
Recheck Cond: ((exit_state IS NULL) AND (parcel_id_code = 32453))
-> BitmapAnd (cost=20.81..20.81 rows=1 width=0) (actual
time=2.403..2.403 rows=0 loops=1)
-> Bitmap Index Scan on tbl_tracker_exit_state_idx
(cost=0.00..9.25 rows=132 width=0) (actual time=2.378..2.378 rows=5 loops=1)
Index Cond: (exit_state IS NULL)
-> Bitmap Index Scan on tbl_tracker_parcel_id_code_idx
(cost=0.00..11.30 rows=62 width=0) (actual time=0.022..0.022 rows=65
loops=1)
Index Cond: (parcel_id_code = 32453)
Total runtime: 2.448 ms


fsc_log => REINDEX;


fsc_log=> explain analyse select * from tbl_tracker WHERE
parcel_id_code='32453' AND exit_state IS NULL;

QUERY PLAN
-------------------------------------------------
Index Scan using tbl_tracker_performance_1_idx on tbl_tracker
(cost=0.00..5.27 rows=1 width=174) (actual time=0.007..0.008 rows=1 loops=1)
Index Cond: (parcel_id_code = 32453)
Total runtime: 0.035 ms
(3 rows)
--
Sent via pgsql-performance mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
John Rouillard
2012-12-24 20:42:52 UTC
Permalink
Post by Richard Neill
[...]
So... problem solved for me: I just have to reindex every few hours.
[...]
2. Is there any way to force the planner to use (or ignore) a
specific index, for testing purposes, short of actually dropping the
index?
This would be very useful for debugging, especially given that query
plans can only really be fully tested on production systems, and
that dropping indexes is rather a bad thing to do when live
operation is simultaneously happening on that server!
I believe that:

BEGIN;
drop index ....
explain analyze ...
explain analyze ...
ROLLBACK;

will do what you want. IIUC Postgres allows DDL within transactions
and thus be rolled back and the operations within the transaction
aren't visible to your other queries running outside the transaction.

http://wiki.postgresql.org/wiki/Transactional_DDL_in_PostgreSQL:_A_Competitive_Analysis

and

http://www.postgresql.org/docs/9.2/static/sql-dropindex.html
--
-- rouilj

John Rouillard System Administrator
Renesys Corporation 603-244-9084 (cell) 603-643-9300 x 111
--
Sent via pgsql-performance mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
Jeff Janes
2012-12-27 04:03:35 UTC
Permalink
Post by John Rouillard
Post by Richard Neill
[...]
So... problem solved for me: I just have to reindex every few hours.
[...]
2. Is there any way to force the planner to use (or ignore) a
specific index, for testing purposes, short of actually dropping the
index?
This would be very useful for debugging, especially given that query
plans can only really be fully tested on production systems, and
that dropping indexes is rather a bad thing to do when live
operation is simultaneously happening on that server!
BEGIN;
drop index ....
explain analyze ...
explain analyze ...
ROLLBACK;
There are two cautions here. One is that doing the drop index takes an
access exclusive lock on the table, and so brings all other connections to
a screeching halt. That is not much nicer to do on a production system
than actually dropping the index, so don't dilly-dally around before doing
the rollback. rollback first, then ruminate on the results of the explain.

Also, this will forcibly cancel any autovacuums occurring on the table. I
think one of the reasons he needs to reindex so much is that he is already
desperately short of vacuuming behavior.

Cheers,

Jeff
Jeff Janes
2012-12-27 04:03:36 UTC
Permalink
Post by Richard Neill
Dear All,
I think periodic reindex may be the solution. Even after reducing the
autovacuum fraction to 0.05, the index still seems to bloat.
Since your index is so tiny compared to the table, I'd lower it even more.
0.0001, maybe.

However, vacuums can't overlap a table, so you can't get the table to be
vacuumed more often than the time it takes to run the vacuum (which took
around 4 minutes at default vacuum cost settings in a toy system) and so
you may need to lower the autovacuum_vacuum_cost_delay to 0 for this table.
(I suspect it is all in cache anyway, so the default settings are too
pessimistic.) Finally, you might need to lower autovacuum_naptime, because
the usually table also won't be vacuumed any more often than that.
Post by Richard Neill
1. The documentation still suggests that reindex should not be needed in
"normal" operation... is this true? Or are the docs wrong? Or have I got
such an edge case?
Your case seems pretty far out there to me.

Cheers,

Jeff
Jeff Janes
2012-12-21 05:15:16 UTC
Permalink
Post by Richard Neill
Dear Tom,
Thanks againg for your help on this.
Post by Richard Neill
The problem is, when I now run my query, the planner ignores the
dedicated index "tbl_tracker_performance_1_**idx", and instead uses both
of the full indexes... resulting in a much much slower query (9ms vs
0.08ms).
I've now installed 9.2. As you said, thanks to the change in 9.2 it
initially prefers the partial index.
BUT, after 1 cycle of inserting 500k rows, then deleting them all, then
starting to insert again, I find that the planner has reverted to the
former bad behaviour.
Presumably the real work load has this type of turn over happen one row at
a time, rather than all in one giant mass update transaction, right? That
makes a big difference in the way space is re-used.
Post by Richard Neill
Reindexing only takes a couple of seconds, and restores correctness.
Even your slow query is pretty fast. If you can't afford that, can you
afford to take an exclusive lock for a couple of seconds every few minutes?
Post by Richard Neill
What's going on? Do I need to run reindex in a cron-job? I thought that
reindex wasn't "normally" needed, and that index bloat happened only after
every row had changed value hundreds of times.
The partial index is highly leveraged. If every tuple in the table is
updated once, that amounts to every tuple in the index being updated 25,000
times.

For the same reason, it is probably not getting vacuum often enough. The
default settings have the table vacuumed once 20% of its rows turns over,
but that means the partial index has been turned over many many times. You
could crank down the auto-vacuum settings for that table, or run manual
vacuum with a cron job.

Vacuum will not unbloat the index, but if you run it often enough it will
keep the bloat from getting to bad in the first place.

But what I think I'd do is change one of your full indexes to contain the
other column as well, and get rid of the partial index. It might not be
quite as efficient as the partial index might theoretically be, but it
should be pretty good and also be less fragile.
Post by Richard Neill
-> Bitmap Index Scan on tbl_tracker_exit_state_idx
(cost=0.00..8.36 rows=151 width=0) (actual time=7.946..7.946 rows=20277
loops=1)
This is finding 100 times more rows than it thinks it will. If that could
be fixed, surely this plan would not look as good. But then, it would
probably just switch to another plan that is not the one you want, either.


Cheers,

Jeff
Richard Neill
2012-12-22 17:29:03 UTC
Permalink
Post by Richard Neill
I've now installed 9.2. As you said, thanks to the change in 9.2 it
initially prefers the partial index.
BUT, after 1 cycle of inserting 500k rows, then deleting them all,
then starting to insert again, I find that the planner has reverted
to the former bad behaviour.
Presumably the real work load has this type of turn over happen one row
at a time, rather than all in one giant mass update transaction, right?
That makes a big difference in the way space is re-used.
Sorry - I meant a "real" workload here. I replayed a whole day's worth
of real data into the DB, and that's what I meant by a cycle. Everything
was row-at-a-time.
(It currently takes about an hour to do this)
Post by Richard Neill
Reindexing only takes a couple of seconds, and restores correctness.
Even your slow query is pretty fast. If you can't afford that, can you
afford to take an exclusive lock for a couple of seconds every few minutes?
Yes, I can. If that's the root cause, I'll do that. But it seems to me
that I've stumbled upon some rather awkward behaviour that I need to
understand fully, and if the index is bloating that badly and that
quickly, then perhaps it's a PG bug (or at least cause for a logfile
warning).

BTW, The index has gone from 16kB to 4.5MB in 6 hours of runtime today.
It still only has 252 matching rows.
Post by Richard Neill
What's going on? Do I need to run reindex in a cron-job? I thought
that reindex wasn't "normally" needed, and that index bloat happened
only after every row had changed value hundreds of times.
The partial index is highly leveraged. If every tuple in the table is
updated once, that amounts to every tuple in the index being updated
25,000 times.
How so? That sounds like O(n_2) behaviour.
Post by Richard Neill
For the same reason, it is probably not getting vacuum often enough.
The default settings have the table vacuumed once 20% of its rows
turns over, but that means the partial index has been turned over many
many times. You could crank down the auto-vacuum settings for that
table, or run manual vacuum with a cron job.
Vacuum will not unbloat the index, but if you run it often enough it
will keep the bloat from getting too bad in the first place.
Thanks. I've reduced autovacuum_vacuum_scale_factor from 0.2 to 0.05
(and set autovacuum_analyze_scale_factor = 0.05 for good measure)

As I understand it, both of these can run in parallel, and I have 7
cores usually idle, while the other is maxed out.
Post by Richard Neill
But what I think I'd do is change one of your full indexes to contain
the other column as well, and get rid of the partial index. It might
not be quite as efficient as the partial index might theoretically be,
but it should be pretty good and also be less fragile.
I'll try that.

Thanks,

Richard
--
Sent via pgsql-performance mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
Jeff Janes
2012-12-27 04:03:32 UTC
Permalink
Post by Richard Neill
Post by Jeff Janes
The partial index is highly leveraged. If every tuple in the table is
updated once, that amounts to every tuple in the index being updated
25,000 times.
How so? That sounds like O(n_2) behaviour.
If the table has 5 million rows while the index has 200 (active) rows at
any given time, then to update every row in the table to null and back
again would be 100% turn over of the table. But each such change would
lead to an addition and then a deletion from the index. So 100% turnover
of the table would be a 5 million / 200 = 25,000 fold turn of the index.

There is some code that allows a btree index entry to get killed (and so
the slot to be reused) without any vacuum, if a scan follows that entry and
finds the corresponding tuple in the table no longer visible to anyone. I
have not examined this code, and don't know whether it is doing its job but
just isn't enough to prevent the bloat, or if for some reason it is not
applicable to your situation.


Cheers,

Jeff
Richard Neill
2012-12-27 10:52:01 UTC
Permalink
Post by Jeff Janes
The partial index is highly leveraged. If every tuple in the
table is updated once, that amounts to every tuple in the index
being updated 25,000 times.
How so? That sounds like O(n_2) behaviour.
If the table has 5 million rows while the index has 200 (active) rows at
any given time, then to update every row in the table to null and back
again would be 100% turn over of the table. But each such change would
lead to an addition and then a deletion from the index. So 100%
turnover of the table would be a 5 million / 200 = 25,000 fold turn of
the index.
Sorry, I was being dense. I misread that as:
"every time a single tuple in the table is updated, the entire index
(every row) is updated".
Yes, of course your explanation makes sense.
Post by Jeff Janes
There is some code that allows a btree index entry to get killed (and so
the slot to be reused) without any vacuum, if a scan follows that entry
and finds the corresponding tuple in the table no longer visible to
anyone. I have not examined this code, and don't know whether it is
doing its job but just isn't enough to prevent the bloat, or if for some
reason it is not applicable to your situation.
It looks like my solution is going to be a REINDEX invoked from cron, or
maybe just every 100k inserts.


In terms of trying to improve this behaviour for other PG users in the
future, are there any more diagnostics I can do for you? Having found a
special case, I'd like to help permanently resolve it if I can.


Thanks very much again.

Best wishes,

Richard
--
Sent via pgsql-performance mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
Richard Neill
2012-12-27 10:49:58 UTC
Permalink
Post by Jeff Janes
The partial index is highly leveraged. If every tuple in the
table is updated once, that amounts to every tuple in the index
being updated 25,000 times.
How so? That sounds like O(n_2) behaviour.
If the table has 5 million rows while the index has 200 (active) rows at
any given time, then to update every row in the table to null and back
again would be 100% turn over of the table. But each such change would
lead to an addition and then a deletion from the index. So 100%
turnover of the table would be a 5 million / 200 = 25,000 fold turn of
the index.
Sorry, I was being dense. I misread that as:
"every time a single tuple in the table is updated, the entire index
(every row) is updated".
Yes, of course your explanation makes sense.
Post by Jeff Janes
There is some code that allows a btree index entry to get killed (and so
the slot to be reused) without any vacuum, if a scan follows that entry
and finds the corresponding tuple in the table no longer visible to
anyone. I have not examined this code, and don't know whether it is
doing its job but just isn't enough to prevent the bloat, or if for some
reason it is not applicable to your situation.
It looks like my solution is going to be a REINDEX invoked from cron, or
maybe just every 100k inserts.


In terms of trying to improve this behaviour for other PG users in the
future, are there any more diagnostics I can do for you? Having found a
special case, I'd like to help permanently resolve it if I can.


Thanks very much again.

Best wishes,

Richard
--
Sent via pgsql-performance mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
Jeff Janes
2012-12-27 16:17:16 UTC
Permalink
Post by Jeff Janes
Post by Richard Neill
-> Bitmap Index Scan on tbl_tracker_exit_state_idx
(cost=0.00..8.36 rows=151 width=0) (actual time=7.946..7.946 rows=20277
loops=1)
This is finding 100 times more rows than it thinks it will. If that could
be fixed, surely this plan would not look as good. But then, it would
probably just switch to another plan that is not the one you want, either.
I guess the issue here is that the histogram postgres uses to estimate the
number of rows that will be found is based on visible rows, and it is
correctly estimating the number of visible rows that will be found. And
that is the relevant thing to pass up to a higher join for its estimation.
But for estimating the number of blocks a given index scan will access,
the right thing would be the number of tuples visited, not the number of
them found to be visible. So that is where this plan goes systematically
wrong.

I guess the correct thing would be for postgres to keep two histograms, one
of all tuples and one of all visible tuples, and to produce different
selectivity estimates for different purposes. But I don't see that change
getting made. It is only meaningful in cases where there is a fundamental
skew in distribution between visible tuples and
invisible-but-as-yet-unvacuumed tuples.

I think that that fundamental skew is the source of both the
underestimation of the bitmap scan cost, and overestimation of the partial
index scan (although I can't get it to overestimate that be anywhere near
the amount you were seeing).

I still think your best bet is to get rid of the partial index and trade
the full one on (parcel_id_code) for one on (parcel_id_code,exit_state). I
think that will be much less fragile than reindexing in a cron job.

Cheers,

Jeff
Richard Neill
2012-12-27 16:43:38 UTC
Permalink
Post by Jeff Janes
I still think your best bet is to get rid of the partial index and trade
the full one on (parcel_id_code) for one on (parcel_id_code,exit_state).
I think that will be much less fragile than reindexing in a cron job.
So, at the moment, I have 3 indexes:
full: parcel_id_code
full: exit_state
full: parcel_id_code where exit state is null

Am I right that when you suggest just a single, joint index
(parcel_id_code,exit_state)
instead of all 3 of the others,

it will allow me to optimally run all of the following? :

1. SELECT * from tbl_tracker WHERE parcel_id_code=22345 AND exit_state
IS NULL

(this is the one we've been discussing)


2. SELECT * from tbl_tracker where parcel_id_code=44533

3. SELECT * from tbl_tracker where exit_code = 2

(2 and 3 are examples of queries I need to run for other purposes,
unrelated to this thread, but which use the other indexes.).


Thanks,

Richard
--
Sent via pgsql-performance mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
Tom Lane
2012-12-27 17:05:18 UTC
Permalink
Post by Richard Neill
full: parcel_id_code
full: exit_state
full: parcel_id_code where exit state is null
Am I right that when you suggest just a single, joint index
(parcel_id_code,exit_state)
instead of all 3 of the others,
I think he was just recommending replacing the first and third indexes.
Post by Richard Neill
1. SELECT * from tbl_tracker WHERE parcel_id_code=22345 AND exit_state
IS NULL
2. SELECT * from tbl_tracker where parcel_id_code=44533
3. SELECT * from tbl_tracker where exit_code = 2
You will need an index with exit_state as the first column to make #3
perform well --- at least, assuming that an index is going to help at
all anyway. The rule of thumb is that if a query is going to fetch
more than a few percent of a table, an index is not useful because
it's going to be touching most table pages anyway, so a seqscan will
win. I've forgotten now what you said the stats for exit_code values
other than null were, but it's quite possible that an index is useless
for #3.

These considerations are mostly covered in the manual:
http://www.postgresql.org/docs/9.2/static/indexes.html

regards, tom lane
--
Sent via pgsql-performance mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
Jeff Janes
2012-12-27 18:07:17 UTC
Permalink
Post by Richard Neill
Post by Jeff Janes
I still think your best bet is to get rid of the partial index and trade
the full one on (parcel_id_code) for one on (parcel_id_code,exit_state).
I think that will be much less fragile than reindexing in a cron job.
full: parcel_id_code
full: exit_state
full: parcel_id_code where exit state is null
Am I right that when you suggest just a single, joint index
(parcel_id_code,exit_state)
instead of all 3 of the others,
No, just instead of 1 and 3. You still need an index on (exit_state) in
order to efficiently satisfy query 3 below.

Alternative, you could keep index 1, and replace 2 and 3 with one on
(exit_state, parcel_id_code). And in fact this might be the better way to
go, because a big problem you are facing is that the (exit_state) index is
looking falsely attractive, and the easiest way to overcome that is to get
rid of that index and replace it with one that can do everything that it
can do, but more.

Theoretically there is technique called "loose scan" or "skip scan" which
could allow you to make one index, (exit_state, parcel_id_code) to replace
all 3 of the above, but postgres does not yet implement that technique. I
think there is a way to achieve the same thing using recursive sql. But I
doubt it would be worth it, as too much index maintenance is not your root
problem.
Post by Richard Neill
3. SELECT * from tbl_tracker where exit_code = 2
Cheers,

Jeff
Loading...