Richard Neill
2012-12-19 21:13:06 UTC
Dear All,
I've just joined this list, and I'd like to request some advice.
I have a table (1 GB in size) with 24 columns, and 5.6 million rows. Of
these, we're interested in two columns, parcel_id_code, and exit_state.
parcel_id_code has a fairly uniform distribution of integers
from 1-99999, it's never null.
exit_state has 3 possible values, 1,2 and null.
Almost all the rows are 1, about 0.1% have the value 2, and
only 153 rows are null
The query I'm trying to optimise looks like this:
SELECT * from tbl_tracker
WHERE parcel_id_code='53030' AND exit_state IS NULL;
So, I have a partial index:
"tbl_tracker_performance_1_idx" btree (parcel_id_code) WHERE
exit_state IS NULL
which works fine if it's the only index.
BUT, for other queries (unrelated to this question), I also have to have
full indexes on these columns:
"tbl_tracker_exit_state_idx" btree (exit_state)
"tbl_tracker_parcel_id_code_idx" btree (parcel_id_code)
The problem is, when I now run my query, the planner ignores the
dedicated index "tbl_tracker_performance_1_idx", and instead uses both
of the full indexes... resulting in a much much slower query (9ms vs
0.08ms).
A psql session is below. This shows that, if I force the planner to use
the partial index, by dropping the others, then it's fast. But as soon
as I put the full indexes back (which I need for other queries), the
query planner chooses them instead, and is slow.
Thanks very much for your help,
Richard
fsc_log => \d tbl_tracker
Column | Type | Modifiers
---------------------+--------------------------+------------------
id | bigint | not null default
nextval('master_id_seq'::regclass)
dreq_timestamp_1 | timestamp with time zone |
barcode_1 | character varying(13) |
barcode_2 | character varying(13) |
barcode_best | character varying(13) |
entrance_point | character varying(13) |
induct | character varying(5) |
entrance_state_x | integer |
dreq_count | integer |
parcel_id_code | integer |
host_id_code | bigint |
original_dest | integer |
drep_timestamp_n | timestamp with time zone |
actual_dest | integer |
exit_state | integer |
chute | integer |
original_dest_state | integer |
srep_timestamp | timestamp with time zone |
asn | character varying(9) |
is_asn_token | boolean |
track_state | integer |
warning | boolean |
Indexes:
"tbl_tracker_pkey" PRIMARY KEY, btree (id) CLUSTER
"tbl_tracker_barcode_best_idx" btree (barcode_best)
"tbl_tracker_chute_idx" btree (chute)
"tbl_tracker_drep_timestamp_n_idx" btree (drep_timestamp_n) WHERE
drep_timestamp_n IS NOT NULL
"tbl_tracker_dreq_timestamp_1_idx" btree (dreq_timestamp_1) WHERE
dreq_timestamp_1 IS NOT NULL
"tbl_tracker_exit_state_idx" btree (exit_state)
"tbl_tracker_parcel_id_code_idx" btree (parcel_id_code)
"tbl_tracker_performance_1_idx" btree (parcel_id_code) WHERE
exit_state IS NULL
"tbl_tracker_performance_2_idx" btree (host_id_code, id)
"tbl_tracker_performance_3_idx" btree (srep_timestamp) WHERE
exit_state = 1 AND srep_timestamp IS NOT NULL
"tbl_tracker_srep_timestamp_idx" btree (srep_timestamp) WHERE
srep_timestamp IS NOT NULL
fsc_log=> explain analyse select * from tbl_tracker where
parcel_id_code='53030' AND exit_state IS NULL;
QUERY PLAN
-----------------------------------------------------------------------
Bitmap Heap Scan on tbl_tracker (cost=8.32..10.84 rows=1 width=174)
(actual time=9.334..9.334 rows=0 loops=1)
Recheck Cond: ((parcel_id_code = 53030) AND (exit_state IS NULL))
-> BitmapAnd (cost=8.32..8.32 rows=1 width=0) (actual
time=9.329..9.329 rows=0 loops=1)
-> Bitmap Index Scan on tbl_tracker_parcel_id_code_idx
(cost=0.00..3.67 rows=57 width=0) (actual time=0.026..0.026 rows=65 loops=1)
Index Cond: (parcel_id_code = 53030)
-> Bitmap Index Scan on tbl_tracker_exit_state_idx
(cost=0.00..4.40 rows=150 width=0) (actual time=9.289..9.289 rows=93744
loops=1)
Index Cond: (exit_state IS NULL)
Total runtime: 9.366 ms
(8 rows)
fsc_log=> drop index tbl_tracker_exit_state_idx;
DROP INDEX
fsc_log=> explain analyse select * from tbl_tracker where
parcel_id_code='53030' AND exit_state IS NULL;
QUERY PLAN
----------------------------------------------------------------------------------------
Bitmap Heap Scan on tbl_tracker (cost=3.67..145.16 rows=1 width=174)
(actual time=0.646..0.646 rows=0 loops=1)
Recheck Cond: (parcel_id_code = 53030)
Filter: (exit_state IS NULL)
-> Bitmap Index Scan on tbl_tracker_parcel_id_code_idx
(cost=0.00..3.67 rows=57 width=0) (actual time=0.024..0.024 rows=65 loops=1)
Index Cond: (parcel_id_code = 53030)
Total runtime: 0.677 ms
(6 rows)
fsc_log=> drop index tbl_tracker_parcel_id_code_idx;
DROP INDEX
fsc_log=> explain analyse select * from tbl_tracker where
parcel_id_code='53030' AND exit_state IS NULL;
QUERY PLAN
--------------------------------------------------------------------------
Index Scan using tbl_tracker_performance_1_idx on tbl_tracker
(cost=0.00..5440.83 rows=1 width=174) (actual time=0.052..0.052 rows=0
loops=1)
Index Cond: (parcel_id_code = 53030)
Total runtime: 0.080 ms
(3 rows)
Server hardware: 8 core, 2.5 GHz, 24 GB, SSD in RAID-1.
Postgresql config (non-default):
version | PostgreSQL 9.1.6 on x86_64
checkpoint_segments | 128
client_encoding | UTF8
commit_delay | 50000
commit_siblings | 5
default_statistics_target | 5000
effective_cache_size | 12000MB
lc_collate | en_GB.UTF-8
lc_ctype | en_GB.UTF-8
log_line_prefix | %t
log_min_duration_statement | 50
maintenance_work_mem | 2GB
max_connections | 100
max_stack_depth | 4MB
port | 5432
random_page_cost | 2.5
server_encoding | UTF8
shared_buffers | 6000MB
ssl | on
standard_conforming_strings | off
temp_buffers | 128MB
TimeZone | GB
wal_buffers | 16MB
work_mem | 256MB
I've just joined this list, and I'd like to request some advice.
I have a table (1 GB in size) with 24 columns, and 5.6 million rows. Of
these, we're interested in two columns, parcel_id_code, and exit_state.
parcel_id_code has a fairly uniform distribution of integers
from 1-99999, it's never null.
exit_state has 3 possible values, 1,2 and null.
Almost all the rows are 1, about 0.1% have the value 2, and
only 153 rows are null
The query I'm trying to optimise looks like this:
SELECT * from tbl_tracker
WHERE parcel_id_code='53030' AND exit_state IS NULL;
So, I have a partial index:
"tbl_tracker_performance_1_idx" btree (parcel_id_code) WHERE
exit_state IS NULL
which works fine if it's the only index.
BUT, for other queries (unrelated to this question), I also have to have
full indexes on these columns:
"tbl_tracker_exit_state_idx" btree (exit_state)
"tbl_tracker_parcel_id_code_idx" btree (parcel_id_code)
The problem is, when I now run my query, the planner ignores the
dedicated index "tbl_tracker_performance_1_idx", and instead uses both
of the full indexes... resulting in a much much slower query (9ms vs
0.08ms).
A psql session is below. This shows that, if I force the planner to use
the partial index, by dropping the others, then it's fast. But as soon
as I put the full indexes back (which I need for other queries), the
query planner chooses them instead, and is slow.
Thanks very much for your help,
Richard
fsc_log => \d tbl_tracker
Column | Type | Modifiers
---------------------+--------------------------+------------------
id | bigint | not null default
nextval('master_id_seq'::regclass)
dreq_timestamp_1 | timestamp with time zone |
barcode_1 | character varying(13) |
barcode_2 | character varying(13) |
barcode_best | character varying(13) |
entrance_point | character varying(13) |
induct | character varying(5) |
entrance_state_x | integer |
dreq_count | integer |
parcel_id_code | integer |
host_id_code | bigint |
original_dest | integer |
drep_timestamp_n | timestamp with time zone |
actual_dest | integer |
exit_state | integer |
chute | integer |
original_dest_state | integer |
srep_timestamp | timestamp with time zone |
asn | character varying(9) |
is_asn_token | boolean |
track_state | integer |
warning | boolean |
Indexes:
"tbl_tracker_pkey" PRIMARY KEY, btree (id) CLUSTER
"tbl_tracker_barcode_best_idx" btree (barcode_best)
"tbl_tracker_chute_idx" btree (chute)
"tbl_tracker_drep_timestamp_n_idx" btree (drep_timestamp_n) WHERE
drep_timestamp_n IS NOT NULL
"tbl_tracker_dreq_timestamp_1_idx" btree (dreq_timestamp_1) WHERE
dreq_timestamp_1 IS NOT NULL
"tbl_tracker_exit_state_idx" btree (exit_state)
"tbl_tracker_parcel_id_code_idx" btree (parcel_id_code)
"tbl_tracker_performance_1_idx" btree (parcel_id_code) WHERE
exit_state IS NULL
"tbl_tracker_performance_2_idx" btree (host_id_code, id)
"tbl_tracker_performance_3_idx" btree (srep_timestamp) WHERE
exit_state = 1 AND srep_timestamp IS NOT NULL
"tbl_tracker_srep_timestamp_idx" btree (srep_timestamp) WHERE
srep_timestamp IS NOT NULL
fsc_log=> explain analyse select * from tbl_tracker where
parcel_id_code='53030' AND exit_state IS NULL;
QUERY PLAN
-----------------------------------------------------------------------
Bitmap Heap Scan on tbl_tracker (cost=8.32..10.84 rows=1 width=174)
(actual time=9.334..9.334 rows=0 loops=1)
Recheck Cond: ((parcel_id_code = 53030) AND (exit_state IS NULL))
-> BitmapAnd (cost=8.32..8.32 rows=1 width=0) (actual
time=9.329..9.329 rows=0 loops=1)
-> Bitmap Index Scan on tbl_tracker_parcel_id_code_idx
(cost=0.00..3.67 rows=57 width=0) (actual time=0.026..0.026 rows=65 loops=1)
Index Cond: (parcel_id_code = 53030)
-> Bitmap Index Scan on tbl_tracker_exit_state_idx
(cost=0.00..4.40 rows=150 width=0) (actual time=9.289..9.289 rows=93744
loops=1)
Index Cond: (exit_state IS NULL)
Total runtime: 9.366 ms
(8 rows)
fsc_log=> drop index tbl_tracker_exit_state_idx;
DROP INDEX
fsc_log=> explain analyse select * from tbl_tracker where
parcel_id_code='53030' AND exit_state IS NULL;
QUERY PLAN
----------------------------------------------------------------------------------------
Bitmap Heap Scan on tbl_tracker (cost=3.67..145.16 rows=1 width=174)
(actual time=0.646..0.646 rows=0 loops=1)
Recheck Cond: (parcel_id_code = 53030)
Filter: (exit_state IS NULL)
-> Bitmap Index Scan on tbl_tracker_parcel_id_code_idx
(cost=0.00..3.67 rows=57 width=0) (actual time=0.024..0.024 rows=65 loops=1)
Index Cond: (parcel_id_code = 53030)
Total runtime: 0.677 ms
(6 rows)
fsc_log=> drop index tbl_tracker_parcel_id_code_idx;
DROP INDEX
fsc_log=> explain analyse select * from tbl_tracker where
parcel_id_code='53030' AND exit_state IS NULL;
QUERY PLAN
--------------------------------------------------------------------------
Index Scan using tbl_tracker_performance_1_idx on tbl_tracker
(cost=0.00..5440.83 rows=1 width=174) (actual time=0.052..0.052 rows=0
loops=1)
Index Cond: (parcel_id_code = 53030)
Total runtime: 0.080 ms
(3 rows)
Server hardware: 8 core, 2.5 GHz, 24 GB, SSD in RAID-1.
Postgresql config (non-default):
version | PostgreSQL 9.1.6 on x86_64
checkpoint_segments | 128
client_encoding | UTF8
commit_delay | 50000
commit_siblings | 5
default_statistics_target | 5000
effective_cache_size | 12000MB
lc_collate | en_GB.UTF-8
lc_ctype | en_GB.UTF-8
log_line_prefix | %t
log_min_duration_statement | 50
maintenance_work_mem | 2GB
max_connections | 100
max_stack_depth | 4MB
port | 5432
random_page_cost | 2.5
server_encoding | UTF8
shared_buffers | 6000MB
ssl | on
standard_conforming_strings | off
temp_buffers | 128MB
TimeZone | GB
wal_buffers | 16MB
work_mem | 256MB
--
Sent via pgsql-performance mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
Sent via pgsql-performance mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance