Discussion:
slow joins?
(too old to reply)
Joe Van Dyk
2013-04-06 01:38:01 UTC
Permalink
On 9.2.4, running two identical queries except for the value of a column in
the WHERE clause. Postgres is picking very different query plans, the first
is much slower than the second.

Any ideas on how I can speed this up? I have btree indexes for all the
columns used in the query.

explain analyze

SELECT COUNT(*)

FROM purchased_items pi

inner join line_items li on li.id = pi.line_item_id

inner join products on products.id = li.product_id

WHERE products.drop_shipper_id = 221;

Aggregate (cost=193356.31..193356.32 rows=1 width=0) (actual
time=2425.225..2425.225 rows=1 loops=1)
-> Hash Join (cost=78864.43..193160.41 rows=78360 width=0) (actual
time=726.612..2424.206 rows=8413 loops=1)
Hash Cond: (pi.line_item_id = li.id)
-> Seq Scan on purchased_items pi (cost=0.00..60912.39
rows=3724639 width=4) (actual time=0.008..616.812 rows=3724639 loops=1)
-> Hash (cost=77937.19..77937.19 rows=56499 width=4) (actual
time=726.231..726.231 rows=8178 loops=1)
Buckets: 4096 Batches: 4 Memory Usage: 73kB
-> Hash Join (cost=1684.33..77937.19 rows=56499 width=4)
(actual time=1.270..723.222 rows=8178 loops=1)
Hash Cond: (li.product_id = products.id)
-> Seq Scan on line_items li (cost=0.00..65617.18
rows=2685518 width=8) (actual time=0.081..392.926 rows=2685499 loops=1)
-> Hash (cost=1676.60..1676.60 rows=618 width=4)
(actual time=0.835..0.835 rows=618 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 22kB
-> Bitmap Heap Scan on products
(cost=13.07..1676.60 rows=618 width=4) (actual time=0.185..0.752 rows=618
loops=1)
Recheck Cond: (drop_shipper_id = 221)
-> Bitmap Index Scan on
index_products_on_drop_shipper_id (cost=0.00..12.92 rows=618 width=0)
(actual time=0.125..0.125 rows=618 loops=1)
Index Cond: (drop_shipper_id = 221)
Total runtime: 2425.302 ms


explain analyze

SELECT COUNT(*)

FROM purchased_items pi

inner join line_items li on li.id = pi.line_item_id

inner join products on products.id = li.product_id

WHERE products.drop_shipper_id = 2;




Aggregate (cost=29260.40..29260.41 rows=1 width=0) (actual
time=0.906..0.906 rows=1 loops=1)
-> Nested Loop (cost=0.00..29254.38 rows=2409 width=0) (actual
time=0.029..0.877 rows=172 loops=1)
-> Nested Loop (cost=0.00..16011.70 rows=1737 width=4) (actual
time=0.021..0.383 rows=167 loops=1)
-> Index Scan using index_products_on_drop_shipper_id on
products (cost=0.00..80.41 rows=19 width=4) (actual time=0.010..0.074
rows=70 loops=1)
Index Cond: (drop_shipper_id = 2)
-> Index Scan using index_line_items_on_product_id on
line_items li (cost=0.00..835.70 rows=279 width=8) (actual
time=0.002..0.004 rows=2 loops=70)
Index Cond: (product_id = products.id)
-> Index Only Scan using purchased_items_line_item_id_idx on
purchased_items pi (cost=0.00..7.60 rows=2 width=4) (actual
time=0.002..0.003 rows=1 loops=167)
Index Cond: (line_item_id = li.id)
Heap Fetches: 5
Total runtime: 0.955 ms
(11 rows)
Joe Van Dyk
2013-04-06 01:42:29 UTC
Permalink
(
https://gist.github.com/joevandyk/df0df703f3fda6d14ae1/raw/c15cae813913b7f8c35b24b467a0c732c0100d79/gistfile1.txtshows
a non-wrapped version of the queries and plan)
Post by Joe Van Dyk
On 9.2.4, running two identical queries except for the value of a column
in the WHERE clause. Postgres is picking very different query plans, the
first is much slower than the second.
Any ideas on how I can speed this up? I have btree indexes for all the
columns used in the query.
explain analyze
SELECT COUNT(*)
FROM purchased_items pi
inner join line_items li on li.id = pi.line_item_id
inner join products on products.id = li.product_id
WHERE products.drop_shipper_id = 221;
Aggregate (cost=193356.31..193356.32 rows=1 width=0) (actual
time=2425.225..2425.225 rows=1 loops=1)
-> Hash Join (cost=78864.43..193160.41 rows=78360 width=0) (actual
time=726.612..2424.206 rows=8413 loops=1)
Hash Cond: (pi.line_item_id = li.id)
-> Seq Scan on purchased_items pi (cost=0.00..60912.39
rows=3724639 width=4) (actual time=0.008..616.812 rows=3724639 loops=1)
-> Hash (cost=77937.19..77937.19 rows=56499 width=4) (actual
time=726.231..726.231 rows=8178 loops=1)
Buckets: 4096 Batches: 4 Memory Usage: 73kB
-> Hash Join (cost=1684.33..77937.19 rows=56499 width=4)
(actual time=1.270..723.222 rows=8178 loops=1)
Hash Cond: (li.product_id = products.id)
-> Seq Scan on line_items li (cost=0.00..65617.18
rows=2685518 width=8) (actual time=0.081..392.926 rows=2685499 loops=1)
-> Hash (cost=1676.60..1676.60 rows=618 width=4)
(actual time=0.835..0.835 rows=618 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 22kB
-> Bitmap Heap Scan on products
(cost=13.07..1676.60 rows=618 width=4) (actual time=0.185..0.752 rows=618
loops=1)
Recheck Cond: (drop_shipper_id = 221)
-> Bitmap Index Scan on
index_products_on_drop_shipper_id (cost=0.00..12.92 rows=618 width=0)
(actual time=0.125..0.125 rows=618 loops=1)
Index Cond: (drop_shipper_id = 221)
Total runtime: 2425.302 ms
explain analyze
SELECT COUNT(*)
FROM purchased_items pi
inner join line_items li on li.id = pi.line_item_id
inner join products on products.id = li.product_id
WHERE products.drop_shipper_id = 2;
Aggregate (cost=29260.40..29260.41 rows=1 width=0) (actual
time=0.906..0.906 rows=1 loops=1)
-> Nested Loop (cost=0.00..29254.38 rows=2409 width=0) (actual
time=0.029..0.877 rows=172 loops=1)
-> Nested Loop (cost=0.00..16011.70 rows=1737 width=4) (actual
time=0.021..0.383 rows=167 loops=1)
-> Index Scan using index_products_on_drop_shipper_id on
products (cost=0.00..80.41 rows=19 width=4) (actual time=0.010..0.074
rows=70 loops=1)
Index Cond: (drop_shipper_id = 2)
-> Index Scan using index_line_items_on_product_id on
line_items li (cost=0.00..835.70 rows=279 width=8) (actual
time=0.002..0.004 rows=2 loops=70)
Index Cond: (product_id = products.id)
-> Index Only Scan using purchased_items_line_item_id_idx on
purchased_items pi (cost=0.00..7.60 rows=2 width=4) (actual
time=0.002..0.003 rows=1 loops=167)
Index Cond: (line_item_id = li.id)
Heap Fetches: 5
Total runtime: 0.955 ms
(11 rows)
Greg Williamson
2013-04-06 01:54:10 UTC
Permalink
Joe --
________________________________
Sent: Friday, April 5, 2013 6:42 PM
Subject: Re: [PERFORM] slow joins?
(https://gist.github.com/joevandyk/df0df703f3fda6d14ae1/raw/c15cae813913b7f8c35b24b467a0c732c0100d79/gistfile1.txt shows a non-wrapped version of the queries and plan)
On 9.2.4, running two identical queries except for the value of a column in the WHERE clause. Postgres is picking very different query plans, the first is much slower than the second.
Any ideas on how I can speed this up?  I have btree indexes for all the columns used in the query.
explain analyze                                                                                    
SELECT COUNT(*)                                                                                    
FROM purchased_items pi                                                                            
inner join line_items li on li.id = pi.line_item_id                                                
inner join products      on products.id = li.product_id                                            
WHERE products.drop_shipper_id = 221;
 Aggregate  (cost=193356.31..193356.32 rows=1 width=0) (actual time=2425.225..2425.225 rows=1 loops=1)
   ->  Hash Join  (cost=78864.43..193160.41 rows=78360 width=0) (actual time=726.612..2424.206 rows=8413 loops=1)
         Hash Cond: (pi.line_item_id = li.id)
         ->  Seq Scan on purchased_items pi  (cost=0.00..60912.39 rows=3724639 width=4) (actual time=0.008..616.812 rows=3724639 loops=1)
         ->  Hash  (cost=77937.19..77937.19 rows=56499 width=4) (actual time=726.231..726.231 rows=8178 loops=1)
               Buckets: 4096  Batches: 4  Memory Usage: 73kB
               ->  Hash Join  (cost=1684.33..77937.19 rows=56499 width=4) (actual time=1.270..723.222 rows=8178 loops=1)
                     Hash Cond: (li.product_id = products.id)
                     ->  Seq Scan on line_items li  (cost=0.00..65617.18 rows=2685518 width=8) (actual time=0.081..392.926 rows=2685499 loops=1)
                     ->  Hash  (cost=1676.60..1676.60 rows=618 width=4) (actual time=0.835..0.835 rows=618 loops=1)
                           Buckets: 1024  Batches: 1  Memory Usage: 22kB
                           ->  Bitmap Heap Scan on products  (cost=13.07..1676.60 rows=618 width=4) (actual time=0.185..0.752 rows=618 loops=1)
                                 Recheck Cond: (drop_shipper_id = 221)
                                 ->  Bitmap Index Scan on index_products_on_drop_shipper_id  (cost=0.00..12.92 rows=618 width=0) (actual time=0.125..0.125 rows=618 loops=1)
                                       Index Cond: (drop_shipper_id = 221)
 Total runtime: 2425.302 ms
explain analyze                                                                                    
SELECT COUNT(*)                                                                                    
FROM purchased_items pi                                                                            
inner join line_items li on li.id = pi.line_item_id                                                
inner join products      on products.id = li.product_id                                            
WHERE products.drop_shipper_id = 2;                                                                
                                                                                                                     
 Aggregate  (cost=29260.40..29260.41 rows=1 width=0) (actual time=0.906..0.906 rows=1 loops=1)
   ->  Nested Loop  (cost=0.00..29254.38 rows=2409 width=0) (actual time=0.029..0.877 rows=172 loops=1)
         ->  Nested Loop  (cost=0.00..16011.70 rows=1737 width=4) (actual time=0.021..0.383 rows=167 loops=1)
               ->  Index Scan using index_products_on_drop_shipper_id on products  (cost=0.00..80.41 rows=19 width=4) (actual time=0.010..0.074 rows=70 loops=1)
                     Index Cond: (drop_shipper_id = 2)
               ->  Index Scan using index_line_items_on_product_id on line_items li  (cost=0.00..835.70 rows=279 width=8) (actual time=0.002..0.004 rows=2 loops=70)
                     Index Cond: (product_id = products.id)
         ->  Index Only Scan using purchased_items_line_item_id_idx on purchased_items pi  (cost=0.00..7.60 rows=2 width=4) (actual time=0.002..0.003 rows=1 loops=167)
               Index Cond: (line_item_id = li.id)
               Heap Fetches: 5
 Total runtime: 0.955 ms
(11 rows)
Does drop_shipper+id have a much larger number of rows which is making the scanner want to avoid an indexed scan or otherwise prefer a sequential scan on products and on line_items ?

What are the stats settings for these tables ?

HTH,

Greg WIlliamson
--
Sent via pgsql-performance mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
Joe Van Dyk
2013-04-06 02:56:02 UTC
Permalink
Post by Greg Williamson
Joe --
________________________________
Sent: Friday, April 5, 2013 6:42 PM
Subject: Re: [PERFORM] slow joins?
(
https://gist.github.com/joevandyk/df0df703f3fda6d14ae1/raw/c15cae813913b7f8c35b24b467a0c732c0100d79/gistfile1.txtshows a non-wrapped version of the queries and plan)
On 9.2.4, running two identical queries except for the value of a column
in the WHERE clause. Postgres is picking very different query plans, the
first is much slower than the second.
Post by Joe Van Dyk
Any ideas on how I can speed this up? I have btree indexes for all the
columns used in the query.
Post by Joe Van Dyk
explain analyze
SELECT COUNT(*)
FROM purchased_items pi
inner join line_items li on li.id = pi.line_item_id
inner join products on products.id = li.product_id
WHERE products.drop_shipper_id = 221;
Aggregate (cost=193356.31..193356.32 rows=1 width=0) (actual
time=2425.225..2425.225 rows=1 loops=1)
Post by Joe Van Dyk
-> Hash Join (cost=78864.43..193160.41 rows=78360 width=0) (actual
time=726.612..2424.206 rows=8413 loops=1)
Post by Joe Van Dyk
Hash Cond: (pi.line_item_id = li.id)
-> Seq Scan on purchased_items pi (cost=0.00..60912.39
rows=3724639 width=4) (actual time=0.008..616.812 rows=3724639 loops=1)
Post by Joe Van Dyk
-> Hash (cost=77937.19..77937.19 rows=56499 width=4) (actual
time=726.231..726.231 rows=8178 loops=1)
Post by Joe Van Dyk
Buckets: 4096 Batches: 4 Memory Usage: 73kB
-> Hash Join (cost=1684.33..77937.19 rows=56499
width=4) (actual time=1.270..723.222 rows=8178 loops=1)
Post by Joe Van Dyk
Hash Cond: (li.product_id = products.id)
-> Seq Scan on line_items li (cost=0.00..65617.18
rows=2685518 width=8) (actual time=0.081..392.926 rows=2685499 loops=1)
Post by Joe Van Dyk
-> Hash (cost=1676.60..1676.60 rows=618 width=4)
(actual time=0.835..0.835 rows=618 loops=1)
Post by Joe Van Dyk
Buckets: 1024 Batches: 1 Memory Usage: 22kB
-> Bitmap Heap Scan on products
(cost=13.07..1676.60 rows=618 width=4) (actual time=0.185..0.752 rows=618
loops=1)
Post by Joe Van Dyk
Recheck Cond: (drop_shipper_id = 221)
-> Bitmap Index Scan on
index_products_on_drop_shipper_id (cost=0.00..12.92 rows=618 width=0)
(actual time=0.125..0.125 rows=618 loops=1)
Post by Joe Van Dyk
Index Cond: (drop_shipper_id =
221)
Post by Joe Van Dyk
Total runtime: 2425.302 ms
explain analyze
SELECT COUNT(*)
FROM purchased_items pi
inner join line_items li on li.id = pi.line_item_id
inner join products on products.id = li.product_id
WHERE products.drop_shipper_id = 2;
Aggregate (cost=29260.40..29260.41 rows=1 width=0) (actual
time=0.906..0.906 rows=1 loops=1)
Post by Joe Van Dyk
-> Nested Loop (cost=0.00..29254.38 rows=2409 width=0) (actual
time=0.029..0.877 rows=172 loops=1)
Post by Joe Van Dyk
-> Nested Loop (cost=0.00..16011.70 rows=1737 width=4)
(actual time=0.021..0.383 rows=167 loops=1)
Post by Joe Van Dyk
-> Index Scan using index_products_on_drop_shipper_id on
products (cost=0.00..80.41 rows=19 width=4) (actual time=0.010..0.074
rows=70 loops=1)
Post by Joe Van Dyk
Index Cond: (drop_shipper_id = 2)
-> Index Scan using index_line_items_on_product_id on
line_items li (cost=0.00..835.70 rows=279 width=8) (actual
time=0.002..0.004 rows=2 loops=70)
Post by Joe Van Dyk
Index Cond: (product_id = products.id)
-> Index Only Scan using purchased_items_line_item_id_idx on
purchased_items pi (cost=0.00..7.60 rows=2 width=4) (actual
time=0.002..0.003 rows=1 loops=167)
Post by Joe Van Dyk
Index Cond: (line_item_id = li.id)
Heap Fetches: 5
Total runtime: 0.955 ms
(11 rows)
Does drop_shipper+id have a much larger number of rows which is making the
scanner want to avoid an indexed scan or otherwise prefer a sequential scan
on products and on line_items ?
Assuming you mean products.drop_shipper_id? There are more rows matched for
the first one vs the second one.
70 products rows match drop_shipper_id=2, 618 match drop_shipper_id=221.
Post by Greg Williamson
What are the stats settings for these tables ?
Whatever the defaults are.
Post by Greg Williamson
HTH,
Greg WIlliamson
--
http://www.postgresql.org/mailpref/pgsql-performance
Greg Williamson
2013-04-06 03:32:56 UTC
Permalink
Joe --
________________________________
Sent: Friday, April 5, 2013 7:56 PM
Subject: Re: [PERFORM] slow joins?
Joe --
Post by Greg Williamson
________________________________
Sent: Friday, April 5, 2013 6:42 PM
Subject: Re: [PERFORM] slow joins?
(https://gist.github.com/joevandyk/df0df703f3fda6d14ae1/raw/c15cae813913b7f8c35b24b467a0c732c0100d79/gistfile1.txt shows a non-wrapped version of the queries and plan)
On 9.2.4, running two identical queries except for the value of a column in the WHERE clause. Postgres is picking very different query plans, the first is much slower than the second.
Any ideas on how I can speed this up?  I have btree indexes for all the columns used in the query.
explain analyze                                                                                    
SELECT COUNT(*)                                                                                    
FROM purchased_items pi                                                                            
inner join line_items li on li.id = pi.line_item_id                                                
inner join products      on products.id = li.product_id                                            
WHERE products.drop_shipper_id = 221;
 Aggregate  (cost=193356.31..193356.32 rows=1 width=0) (actual time=2425.225..2425.225 rows=1 loops=1)
   ->  Hash Join  (cost=78864.43..193160.41 rows=78360 width=0) (actual time=726.612..2424.206 rows=8413 loops=1)
         Hash Cond: (pi.line_item_id = li.id)
         ->  Seq Scan on purchased_items pi  (cost=0.00..60912.39 rows=3724639 width=4) (actual time=0.008..616.812 rows=3724639 loops=1)
         ->  Hash  (cost=77937.19..77937.19 rows=56499 width=4) (actual time=726.231..726.231 rows=8178 loops=1)
               Buckets: 4096  Batches: 4  Memory Usage: 73kB
               ->  Hash Join  (cost=1684.33..77937.19 rows=56499 width=4) (actual time=1.270..723.222 rows=8178 loops=1)
                     Hash Cond: (li.product_id = products.id)
                     ->  Seq Scan on line_items li  (cost=0.00..65617.18 rows=2685518 width=8) (actual time=0.081..392.926 rows=2685499 loops=1)
                     ->  Hash  (cost=1676.60..1676.60 rows=618 width=4) (actual time=0.835..0.835 rows=618 loops=1)
                           Buckets: 1024  Batches: 1  Memory Usage: 22kB
                           ->  Bitmap Heap Scan on products  (cost=13.07..1676.60 rows=618 width=4) (actual time=0.185..0.752 rows=618 loops=1)
                                 Recheck Cond: (drop_shipper_id = 221)
                                 ->  Bitmap Index Scan on index_products_on_drop_shipper_id  (cost=0.00..12.92 rows=618 width=0) (actual time=0.125..0.125 rows=618 loops=1)
                                       Index Cond: (drop_shipper_id = 221)
 Total runtime: 2425.302 ms
explain analyze                                                                                    
SELECT COUNT(*)                                                                                    
FROM purchased_items pi                                                                            
inner join line_items li on li.id = pi.line_item_id                                                
inner join products      on products.id = li.product_id                                            
WHERE products.drop_shipper_id = 2;                                                                
                                                                                                                     
 Aggregate  (cost=29260.40..29260.41 rows=1 width=0) (actual time=0.906..0.906 rows=1 loops=1)
   ->  Nested Loop  (cost=0.00..29254.38 rows=2409 width=0) (actual time=0.029..0.877 rows=172 loops=1)
         ->  Nested Loop  (cost=0.00..16011.70 rows=1737 width=4) (actual time=0.021..0.383 rows=167 loops=1)
               ->  Index Scan using index_products_on_drop_shipper_id on products  (cost=0.00..80.41 rows=19 width=4) (actual time=0.010..0.074 rows=70 loops=1)
                     Index Cond: (drop_shipper_id = 2)
               ->  Index Scan using index_line_items_on_product_id on line_items li  (cost=0.00..835.70 rows=279 width=8) (actual time=0.002..0.004 rows=2 loops=70)
                     Index Cond: (product_id = products.id)
         ->  Index Only Scan using purchased_items_line_item_id_idx on purchased_items pi  (cost=0.00..7.60 rows=2 width=4) (actual time=0.002..0.003 rows=1 loops=167)
               Index Cond: (line_item_id = li.id)
               Heap Fetches: 5
 Total runtime: 0.955 ms
(11 rows)
Does drop_shipper+id have a much larger number of rows which is making the scanner want to avoid an indexed scan or otherwise prefer a sequential scan on products and on line_items ?
Assuming you mean products.drop_shipper_id? There are more rows matched for the first one vs the second one. 
70 products rows match drop_shipper_id=2, 618 match drop_shipper_id=221.
 
What are the stats settings for these tables ?
Whatever the defaults are.
 
I mis-pasted the tables -- both line_items and purchased items are getting sequential scans for the relevant rows; it is possible that that there's enough difference to tip the planner to use sequential scans.

You might try increasing the stats being collected on those two tables, run analyze on all the tables in the query, and try it again.

GW
--
Sent via pgsql-performance mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
Joe Van Dyk
2013-04-06 01:50:30 UTC
Permalink
If I disable sequential scans, hash joins, and merge joins, the query plans
become the same and performance on the first slow one is much improved.

Is there something else I can do to avoid this problem?

below also at
https://gist.github.com/joevandyk/34e31b3ad5cccb730a50/raw/8081a4298ba50ac93a86df97c1d0aae482ee7d2d/gistfile1.txt

Aggregate (cost=869360.53..869360.54 rows=1 width=0) (actual
time=103.102..103.102 rows=1 loops=1)
-> Nested Loop (cost=0.00..869164.63 rows=78360 width=0) (actual
time=0.253..101.708 rows=8413 loops=1)
-> Nested Loop (cost=0.00..438422.95 rows=56499 width=4) (actual
time=0.157..51.766 rows=8178 loops=1)
-> Index Scan using index_products_on_drop_shipper_id on
products (cost=0.00..2312.56 rows=618 width=4) (actual time=0.087..6.318
rows=618 loops=1)
Index Cond: (drop_shipper_id = 221)
-> Index Scan using index_line_items_on_product_id on
line_items li (cost=0.00..702.89 rows=279 width=8) (actual
time=0.010..0.069 rows=13 loops=618)
Index Cond: (product_id = products.id)
-> Index Only Scan using purchased_items_line_item_id_idx on
purchased_items pi (cost=0.00..7.60 rows=2 width=4) (actual
time=0.005..0.005 rows=1 loops=8178)
Index Cond: (line_item_id = li.id)
Heap Fetches: 144
Total runtime: 103.442 ms
(11 rows)
Post by Joe Van Dyk
On 9.2.4, running two identical queries except for the value of a column
in the WHERE clause. Postgres is picking very different query plans, the
first is much slower than the second.
Any ideas on how I can speed this up? I have btree indexes for all the
columns used in the query.
explain analyze
SELECT COUNT(*)
FROM purchased_items pi
inner join line_items li on li.id = pi.line_item_id
inner join products on products.id = li.product_id
WHERE products.drop_shipper_id = 221;
Aggregate (cost=193356.31..193356.32 rows=1 width=0) (actual
time=2425.225..2425.225 rows=1 loops=1)
-> Hash Join (cost=78864.43..193160.41 rows=78360 width=0) (actual
time=726.612..2424.206 rows=8413 loops=1)
Hash Cond: (pi.line_item_id = li.id)
-> Seq Scan on purchased_items pi (cost=0.00..60912.39
rows=3724639 width=4) (actual time=0.008..616.812 rows=3724639 loops=1)
-> Hash (cost=77937.19..77937.19 rows=56499 width=4) (actual
time=726.231..726.231 rows=8178 loops=1)
Buckets: 4096 Batches: 4 Memory Usage: 73kB
-> Hash Join (cost=1684.33..77937.19 rows=56499 width=4)
(actual time=1.270..723.222 rows=8178 loops=1)
Hash Cond: (li.product_id = products.id)
-> Seq Scan on line_items li (cost=0.00..65617.18
rows=2685518 width=8) (actual time=0.081..392.926 rows=2685499 loops=1)
-> Hash (cost=1676.60..1676.60 rows=618 width=4)
(actual time=0.835..0.835 rows=618 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 22kB
-> Bitmap Heap Scan on products
(cost=13.07..1676.60 rows=618 width=4) (actual time=0.185..0.752 rows=618
loops=1)
Recheck Cond: (drop_shipper_id = 221)
-> Bitmap Index Scan on
index_products_on_drop_shipper_id (cost=0.00..12.92 rows=618 width=0)
(actual time=0.125..0.125 rows=618 loops=1)
Index Cond: (drop_shipper_id = 221)
Total runtime: 2425.302 ms
explain analyze
SELECT COUNT(*)
FROM purchased_items pi
inner join line_items li on li.id = pi.line_item_id
inner join products on products.id = li.product_id
WHERE products.drop_shipper_id = 2;
Aggregate (cost=29260.40..29260.41 rows=1 width=0) (actual
time=0.906..0.906 rows=1 loops=1)
-> Nested Loop (cost=0.00..29254.38 rows=2409 width=0) (actual
time=0.029..0.877 rows=172 loops=1)
-> Nested Loop (cost=0.00..16011.70 rows=1737 width=4) (actual
time=0.021..0.383 rows=167 loops=1)
-> Index Scan using index_products_on_drop_shipper_id on
products (cost=0.00..80.41 rows=19 width=4) (actual time=0.010..0.074
rows=70 loops=1)
Index Cond: (drop_shipper_id = 2)
-> Index Scan using index_line_items_on_product_id on
line_items li (cost=0.00..835.70 rows=279 width=8) (actual
time=0.002..0.004 rows=2 loops=70)
Index Cond: (product_id = products.id)
-> Index Only Scan using purchased_items_line_item_id_idx on
purchased_items pi (cost=0.00..7.60 rows=2 width=4) (actual
time=0.002..0.003 rows=1 loops=167)
Index Cond: (line_item_id = li.id)
Heap Fetches: 5
Total runtime: 0.955 ms
(11 rows)
Julien Cigar
2013-04-06 10:53:37 UTC
Permalink
try to increase cpu_tuple_cost to 0.1
Post by Joe Van Dyk
If I disable sequential scans, hash joins, and merge joins, the query
plans become the same and performance on the first slow one is much
improved.
Is there something else I can do to avoid this problem?
below also at
https://gist.github.com/joevandyk/34e31b3ad5cccb730a50/raw/8081a4298ba50ac93a86df97c1d0aae482ee7d2d/gistfile1.txt
Aggregate (cost=869360.53..869360.54 rows=1 width=0) (actual
time=103.102..103.102 rows=1 loops=1)
-> Nested Loop (cost=0.00..869164.63 rows=78360 width=0) (actual
time=0.253..101.708 rows=8413 loops=1)
-> Nested Loop (cost=0.00..438422.95 rows=56499 width=4)
(actual time=0.157..51.766 rows=8178 loops=1)
-> Index Scan using index_products_on_drop_shipper_id
on products (cost=0.00..2312.56 rows=618 width=4) (actual
time=0.087..6.318 rows=618 loops=1)
Index Cond: (drop_shipper_id = 221)
-> Index Scan using index_line_items_on_product_id on
line_items li (cost=0.00..702.89 rows=279 width=8) (actual
time=0.010..0.069 rows=13 loops=618)
Index Cond: (product_id = products.id
<http://products.id>)
-> Index Only Scan using purchased_items_line_item_id_idx on
purchased_items pi (cost=0.00..7.60 rows=2 width=4) (actual
time=0.005..0.005 rows=1 loops=8178)
Index Cond: (line_item_id = li.id <http://li.id>)
Heap Fetches: 144
Total runtime: 103.442 ms
(11 rows)
On 9.2.4, running two identical queries except for the value of a
column in the WHERE clause. Postgres is picking very different
query plans, the first is much slower than the second.
Any ideas on how I can speed this up? I have btree indexes for
all the columns used in the query.
explain analyze
SELECT COUNT(*)
FROM purchased_items pi
inner join line_items li on li.id <http://li.id> = pi.line_item_id
inner join products on products.id <http://products.id> =
li.product_id
WHERE products.drop_shipper_id = 221;
Aggregate (cost=193356.31..193356.32 rows=1 width=0) (actual
time=2425.225..2425.225 rows=1 loops=1)
-> Hash Join (cost=78864.43..193160.41 rows=78360 width=0)
(actual time=726.612..2424.206 rows=8413 loops=1)
Hash Cond: (pi.line_item_id = li.id <http://li.id>)
-> Seq Scan on purchased_items pi (cost=0.00..60912.39
rows=3724639 width=4) (actual time=0.008..616.812 rows=3724639 loops=1)
-> Hash (cost=77937.19..77937.19 rows=56499 width=4)
(actual time=726.231..726.231 rows=8178 loops=1)
Buckets: 4096 Batches: 4 Memory Usage: 73kB
-> Hash Join (cost=1684.33..77937.19 rows=56499
width=4) (actual time=1.270..723.222 rows=8178 loops=1)
Hash Cond: (li.product_id = products.id
<http://products.id>)
-> Seq Scan on line_items li
(cost=0.00..65617.18 rows=2685518 width=8) (actual
time=0.081..392.926 rows=2685499 loops=1)
-> Hash (cost=1676.60..1676.60 rows=618
width=4) (actual time=0.835..0.835 rows=618 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 22kB
-> Bitmap Heap Scan on products
(cost=13.07..1676.60 rows=618 width=4) (actual time=0.185..0.752
rows=618 loops=1)
Recheck Cond: (drop_shipper_id = 221)
-> Bitmap Index Scan on
index_products_on_drop_shipper_id (cost=0.00..12.92 rows=618
width=0) (actual time=0.125..0.125 rows=618 loops=1)
(drop_shipper_id = 221)
Total runtime: 2425.302 ms
explain analyze
SELECT COUNT(*)
FROM purchased_items pi
inner join line_items li on li.id <http://li.id> = pi.line_item_id
inner join products on products.id <http://products.id> =
li.product_id
WHERE products.drop_shipper_id = 2;
Aggregate (cost=29260.40..29260.41 rows=1 width=0) (actual
time=0.906..0.906 rows=1 loops=1)
-> Nested Loop (cost=0.00..29254.38 rows=2409 width=0)
(actual time=0.029..0.877 rows=172 loops=1)
-> Nested Loop (cost=0.00..16011.70 rows=1737 width=4)
(actual time=0.021..0.383 rows=167 loops=1)
-> Index Scan using
index_products_on_drop_shipper_id on products (cost=0.00..80.41
rows=19 width=4) (actual time=0.010..0.074 rows=70 loops=1)
Index Cond: (drop_shipper_id = 2)
-> Index Scan using index_line_items_on_product_id
on line_items li (cost=0.00..835.70 rows=279 width=8) (actual
time=0.002..0.004 rows=2 loops=70)
Index Cond: (product_id = products.id
<http://products.id>)
-> Index Only Scan using
purchased_items_line_item_id_idx on purchased_items pi
(cost=0.00..7.60 rows=2 width=4) (actual time=0.002..0.003 rows=1
loops=167)
Index Cond: (line_item_id = li.id <http://li.id>)
Heap Fetches: 5
Total runtime: 0.955 ms
(11 rows)
Kevin Grittner
2013-04-06 14:22:29 UTC
Permalink
Post by Julien Cigar
try to increase cpu_tuple_cost to 0.1
I agree that's on the right track, but possibly an overly blunt
tool for the job.  The following settings are likely to need
adjustment, IMO:

effective_cache_size: People often set this to somewhere in the
range of 50% to 75% of the RAM on the machine.  This setting does
not allocate RAM, but tells the planner how likely it is to find
things in cache for, say, repeated index access.  A higher setting
makes the random access involved in index scans seem like less of a
problem.

random_page_cost: You seem to have a very high cache hit ratio,
between shared_buffers and the OS cache.  To model this you should
decrease random_page_cost to something just above seq_page_cost or
equal to it.  To reflect the relatively low cost of reading a page
from the OS cache (compared to actually reading from disk) you
might want to reduce both of these below 1.  0.1 is a not-uncommon
setting for instances with the active portion of the database
well-cached.

cpu_tuple_cost: I always raise this; I think our default is just
too low to accurately model the cost of reading a row, compared to
the cost factors used for other things.  In combination with the
above changes I've never had to go beyond 0.03 to get a good plan.
I've pushed it to 0.05 to see if that put me near a tipping point
for a bad plan, and saw no ill effects.  I've never tried higher
than 0.05, so I can't speak to that.

In any event, your current cost settings aren't accurately modeling
actual costs in your environment for your workload.  You need to
adjust them.

One of the estimates was off, so increasing the statistics sample
size might help, but I suspect that you need to make adjustments
like the above in any event.
--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-performance mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
Julien Cigar
2013-04-06 14:43:50 UTC
Permalink
Post by Kevin Grittner
Post by Julien Cigar
try to increase cpu_tuple_cost to 0.1
I agree that's on the right track, but possibly an overly blunt
tool for the job. The following settings are likely to need
effective_cache_size: People often set this to somewhere in the
range of 50% to 75% of the RAM on the machine. This setting does
not allocate RAM, but tells the planner how likely it is to find
things in cache for, say, repeated index access. A higher setting
makes the random access involved in index scans seem like less of a
problem.
I agree that the very first thing to check is effective_cache_size
Post by Kevin Grittner
random_page_cost: You seem to have a very high cache hit ratio,
between shared_buffers and the OS cache. To model this you should
decrease random_page_cost to something just above seq_page_cost or
equal to it. To reflect the relatively low cost of reading a page
from the OS cache (compared to actually reading from disk) you
might want to reduce both of these below 1. 0.1 is a not-uncommon
setting for instances with the active portion of the database
well-cached.
I would first raise cpu_tuple_cost rather than touch random_page_cost.
Raising cpu_tuple_cost is
a more "fine-grained method" for discouraging seqscans than
random_page_cost is.
Post by Kevin Grittner
cpu_tuple_cost: I always raise this; I think our default is just
too low to accurately model the cost of reading a row, compared to
the cost factors used for other things. In combination with the
above changes I've never had to go beyond 0.03 to get a good plan.
I've pushed it to 0.05 to see if that put me near a tipping point
for a bad plan, and saw no ill effects. I've never tried higher
than 0.05, so I can't speak to that.
Yep, default cpu_tuple_cost is just too low ..
Post by Kevin Grittner
In any event, your current cost settings aren't accurately modeling
actual costs in your environment for your workload. You need to
adjust them.
One of the estimates was off, so increasing the statistics sample
size might help, but I suspect that you need to make adjustments
like the above in any event.
--
Sent via pgsql-performance mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
Loading...