Ao Jianwang
2013-03-15 15:02:28 UTC
Hi Experts,
I found if we join the master table with other small table, then the
running time is slow. While, if we join each child table with the small
table, then it's very fast. Any comments and suggestions are greatly
appreciated.
*For example, par_list table is small(about 50k rows), while par_est is
very large, for each day it's about 400MB. Therefore, we partition it by
day. However, the query plan for joining the master table with par_list is
bad, so the running time is slow. The good plan should be join each
partition table with par_list separately, then aggregate the result
together. *
*
*
*1. Join the master table with a small table. It's slow.*
dailyest=# explain (analyze on, buffers on)
dailyest-# SELECT e.date, max(e.estimate)
dailyest-# FROM
dailyest-# par_list l,
dailyest-# par_est e
dailyest-# WHERE
dailyest-# l.id = e.list_id and
dailyest-# e.date BETWEEN '2012-07-08' and '2012-07-10' and
dailyest-# l.fid = 1 and
dailyest-# l.sid = 143441 and
dailyest-# l.cid in (36, 39, 6000) and
dailyest-# e.aid = 333710667
dailyest-# GROUP BY e.date
dailyest-# ORDER BY e.date;
-----------------------
GroupAggregate (cost=745326.86..745326.88 rows=1 width=8) (actual
time=6281.364..6281.366 rows=3 loops=1)
Buffers: shared hit=3 read=175869
-> Sort (cost=745326.86..745326.86 rows=1 width=8) (actual
time=6281.358..6281.358 rows=6 loops=1)
Sort Key: e.date
Sort Method: quicksort Memory: 25kB
Buffers: shared hit=3 read=175869
-> Nested Loop (cost=0.00..745326.85 rows=1 width=8) (actual
time=1228.493..6281.349 rows=6 loops=1)
Join Filter: (l.id = e.list_id)
Rows Removed by Join Filter: 4040
Buffers: shared hit=3 read=175869
-> Seq Scan on par_list l (cost=0.00..1213.10 rows=2
width=4) (actual time=0.010..38.272 rows=2 loops=1)
Filter: ((fid = 1) AND (sid = 143441) AND (cid = ANY
('{36,39,6000}'::integer[])))
Rows Removed by Filter: 50190
Buffers: shared hit=3 read=269
-> Materialize (cost=0.00..744102.56 rows=407 width=12)
(actual time=9.707..3121.053 rows=2023 loops=2)
Buffers: shared read=175600
-> Append (cost=0.00..744100.52 rows=407 width=12)
(actual time=19.410..6240.044 rows=2023 loops=1)
Buffers: shared read=175600
-> Seq Scan on par_est e (cost=0.00..0.00
rows=1 width=12) (actual time=0.001..0.001 rows=0 loops=1)
Filter: ((date >= '2012-07-08'::date) AND
(date <= '2012-07-10'::date) AND (aid = 333710667))
-> Seq Scan on par_est_2012_07 e
(cost=0.00..0.00 rows=1 width=12) (actual time=0.000..0.000 rows=0 loops=1)
Filter: ((date >= '2012-07-08'::date) AND
(date <= '2012-07-10'::date) AND (aid = 333710667))
-> Seq Scan on par_est_2012_07_08 e
(cost=0.00..247736.09 rows=135 width=12) (actual time=19.408..2088.627
rows=674 loops=1)
Filter: ((date >= '2012-07-08'::date) AND
(date <= '2012-07-10'::date) AND (aid = 333710667))
Rows Removed by Filter: 10814878
Buffers: shared read=58463
-> Seq Scan on par_est_2012_07_09 e
(cost=0.00..248008.81 rows=137 width=12) (actual time=6.390..1963.238
rows=676 loops=1)
Filter: ((date >= '2012-07-08'::date) AND
(date <= '2012-07-10'::date) AND (aid = 333710667))
Rows Removed by Filter: 10826866
Buffers: shared read=58528
-> Seq Scan on par_est_2012_07_10 e
(cost=0.00..248355.62 rows=133 width=12) (actual time=15.135..2187.312
rows=673 loops=1)
Filter: ((date >= '2012-07-08'::date) AND
(date <= '2012-07-10'::date) AND (aid = 333710667))
Rows Removed by Filter: 10841989
Buffers: shared read=58609
Total runtime: 6281.444 ms
(35 rows)
*2. Join each partition table with small table (par_list) and union the
result. This runs very fast. However, it's not reasonable if we union 180
SELECT statements (for example, the date is from 2012-07-01 to 2012-12-31.
Any better suggestions.*
*
*
dailyest=# explain (analyze on, buffers on)
dailyest-# SELECT e.date, max(e.estimate)
dailyest-# FROM
dailyest-# par_list l,
dailyest-# par_est_2012_07_08 e
dailyest-# WHERE
dailyest-# l.id = e.list_id and
dailyest-# e.date = '2012-07-08' and
dailyest-# l.fid = 1 and
dailyest-# l.sid = 143441 and
dailyest-# l.cid in (36, 39, 6000) and
dailyest-# e.aid = 333710667
dailyest-# GROUP BY e.date
dailyest-# UNION ALL
dailyest-# SELECT e.date, max(e.estimate)
dailyest-# FROM
dailyest-# par_list l,
dailyest-# par_est_2012_07_09 e
dailyest-# WHERE
dailyest-# l.id = e.list_id and
dailyest-# e.date = '2012-07-09' and
dailyest-# l.fid = 1 and
dailyest-# l.sid = 143441 and
dailyest-# l.cid in (36, 39, 6000) and
dailyest-# e.aid = 333710667
dailyest-# GROUP BY e.date
dailyest-# UNION ALL
dailyest-# SELECT e.date, max(e.estimate)
dailyest-# FROM
dailyest-# par_list l,
dailyest-# par_est_2012_07_10 e
dailyest-# WHERE
dailyest-# l.id = e.list_id and
dailyest-# e.date = '2012-07-10' and
dailyest-# l.fid = 1 and
dailyest-# l.sid = 143441 and
dailyest-# l.cid in (36, 39, 6000) and
dailyest-# e.aid = 333710667
dailyest-# GROUP BY e.date
dailyest-# ;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------
------------------------------------------------------
Result (cost=0.00..91.49 rows=3 width=8) (actual time=83.736..254.912
rows=3 loops=1)
Buffers: shared hit=27 read=28
-> Append (cost=0.00..91.49 rows=3 width=8) (actual
time=83.735..254.910 rows=3 loops=1)
Buffers: shared hit=27 read=28
-> GroupAggregate (cost=0.00..30.48 rows=1 width=8) (actual
time=83.735..83.735 rows=1 loops=1)
Buffers: shared hit=9 read=12
-> Nested Loop (cost=0.00..30.47 rows=1 width=8) (actual
time=63.920..83.728 rows=2 loops=1)
Buffers: shared hit=9 read=12
-> Index Scan using par_list_sid_fid_cid_key on
par_list l (cost=0.00..18.56 rows=2 width=4) (actual time=1.540..1.550
rows=2 loops=1)
Index Cond: ((sid = 143441) AND (fid = 1) AND
(cid = ANY ('{36,39,6000}'::integer[])))
Buffers: shared hit=7 read=4
-> Index Only Scan using par_est_2012_07_08_pkey on
par_est_2012_07_08 e (cost=0.00..5.94 rows=1 width=12) (actual
time=41.083..41.083 rows=1 loops=2)
Index Cond: ((date = '2012-07-08'::date) AND
(list_id = l.id) AND (aid = 333710667))
Heap Fetches: 0
Buffers: shared hit=2 read=8
-> GroupAggregate (cost=0.00..30.48 rows=1 width=8) (actual
time=76.911..76.911 rows=1 loops=1)
Buffers: shared hit=9 read=8
-> Nested Loop (cost=0.00..30.47 rows=1 width=8) (actual
time=57.580..76.909 rows=2 loops=1)
Buffers: shared hit=9 read=8
-> Index Scan using par_list_sid_fid_cid_key on
par_list l (cost=0.00..18.56 rows=2 width=4) (actual time=0.007..0.016
rows=2 loops=1)
Index Cond: ((sid = 143441) AND (fid = 1) AND
(cid = ANY ('{36,39,6000}'::integer[])))
Buffers: shared hit=7
-> Index Only Scan using par_est_2012_07_09_pkey on
par_est_2012_07_09 e (cost=0.00..5.94 rows=1 width=12) (actual
time=38.440..38.442 rows=1 loops=2)
Index Cond: ((date = '2012-07-09'::date) AND
(list_id = l.id) AND (aid = 333710667))
Heap Fetches: 0
Buffers: shared hit=2 read=8
-> GroupAggregate (cost=0.00..30.49 rows=1 width=8) (actual
time=94.262..94.262 rows=1 loops=1)
Buffers: shared hit=9 read=8
-> Nested Loop (cost=0.00..30.47 rows=1 width=8) (actual
time=74.393..94.259 rows=2 loops=1)
Buffers: shared hit=9 read=8
-> Index Scan using par_list_sid_fid_cid_key on
par_list l (cost=0.00..18.56 rows=2 width=4) (actual time=0.007..0.017
rows=2 loops=1)
Index Cond: ((sid = 143441) AND (fid = 1) AND
(cid = ANY ('{36,39,6000}'::integer[])))
Buffers: shared hit=7
-> Index Only Scan using par_est_2012_07_10_pkey on
par_est_2012_07_10 e (cost=0.00..5.95 rows=1 width=12) (actual
time=47.116..47.117 rows=1 loops=2)
Index Cond: ((date = '2012-07-10'::date) AND
(list_id = l.id) AND (aid = 333710667))
Heap Fetches: 0
Buffers: shared hit=2 read=8
Total runtime: 255.074 ms
(38 rows)
I found if we join the master table with other small table, then the
running time is slow. While, if we join each child table with the small
table, then it's very fast. Any comments and suggestions are greatly
appreciated.
*For example, par_list table is small(about 50k rows), while par_est is
very large, for each day it's about 400MB. Therefore, we partition it by
day. However, the query plan for joining the master table with par_list is
bad, so the running time is slow. The good plan should be join each
partition table with par_list separately, then aggregate the result
together. *
*
*
*1. Join the master table with a small table. It's slow.*
dailyest=# explain (analyze on, buffers on)
dailyest-# SELECT e.date, max(e.estimate)
dailyest-# FROM
dailyest-# par_list l,
dailyest-# par_est e
dailyest-# WHERE
dailyest-# l.id = e.list_id and
dailyest-# e.date BETWEEN '2012-07-08' and '2012-07-10' and
dailyest-# l.fid = 1 and
dailyest-# l.sid = 143441 and
dailyest-# l.cid in (36, 39, 6000) and
dailyest-# e.aid = 333710667
dailyest-# GROUP BY e.date
dailyest-# ORDER BY e.date;
-----------------------
GroupAggregate (cost=745326.86..745326.88 rows=1 width=8) (actual
time=6281.364..6281.366 rows=3 loops=1)
Buffers: shared hit=3 read=175869
-> Sort (cost=745326.86..745326.86 rows=1 width=8) (actual
time=6281.358..6281.358 rows=6 loops=1)
Sort Key: e.date
Sort Method: quicksort Memory: 25kB
Buffers: shared hit=3 read=175869
-> Nested Loop (cost=0.00..745326.85 rows=1 width=8) (actual
time=1228.493..6281.349 rows=6 loops=1)
Join Filter: (l.id = e.list_id)
Rows Removed by Join Filter: 4040
Buffers: shared hit=3 read=175869
-> Seq Scan on par_list l (cost=0.00..1213.10 rows=2
width=4) (actual time=0.010..38.272 rows=2 loops=1)
Filter: ((fid = 1) AND (sid = 143441) AND (cid = ANY
('{36,39,6000}'::integer[])))
Rows Removed by Filter: 50190
Buffers: shared hit=3 read=269
-> Materialize (cost=0.00..744102.56 rows=407 width=12)
(actual time=9.707..3121.053 rows=2023 loops=2)
Buffers: shared read=175600
-> Append (cost=0.00..744100.52 rows=407 width=12)
(actual time=19.410..6240.044 rows=2023 loops=1)
Buffers: shared read=175600
-> Seq Scan on par_est e (cost=0.00..0.00
rows=1 width=12) (actual time=0.001..0.001 rows=0 loops=1)
Filter: ((date >= '2012-07-08'::date) AND
(date <= '2012-07-10'::date) AND (aid = 333710667))
-> Seq Scan on par_est_2012_07 e
(cost=0.00..0.00 rows=1 width=12) (actual time=0.000..0.000 rows=0 loops=1)
Filter: ((date >= '2012-07-08'::date) AND
(date <= '2012-07-10'::date) AND (aid = 333710667))
-> Seq Scan on par_est_2012_07_08 e
(cost=0.00..247736.09 rows=135 width=12) (actual time=19.408..2088.627
rows=674 loops=1)
Filter: ((date >= '2012-07-08'::date) AND
(date <= '2012-07-10'::date) AND (aid = 333710667))
Rows Removed by Filter: 10814878
Buffers: shared read=58463
-> Seq Scan on par_est_2012_07_09 e
(cost=0.00..248008.81 rows=137 width=12) (actual time=6.390..1963.238
rows=676 loops=1)
Filter: ((date >= '2012-07-08'::date) AND
(date <= '2012-07-10'::date) AND (aid = 333710667))
Rows Removed by Filter: 10826866
Buffers: shared read=58528
-> Seq Scan on par_est_2012_07_10 e
(cost=0.00..248355.62 rows=133 width=12) (actual time=15.135..2187.312
rows=673 loops=1)
Filter: ((date >= '2012-07-08'::date) AND
(date <= '2012-07-10'::date) AND (aid = 333710667))
Rows Removed by Filter: 10841989
Buffers: shared read=58609
Total runtime: 6281.444 ms
(35 rows)
*2. Join each partition table with small table (par_list) and union the
result. This runs very fast. However, it's not reasonable if we union 180
SELECT statements (for example, the date is from 2012-07-01 to 2012-12-31.
Any better suggestions.*
*
*
dailyest=# explain (analyze on, buffers on)
dailyest-# SELECT e.date, max(e.estimate)
dailyest-# FROM
dailyest-# par_list l,
dailyest-# par_est_2012_07_08 e
dailyest-# WHERE
dailyest-# l.id = e.list_id and
dailyest-# e.date = '2012-07-08' and
dailyest-# l.fid = 1 and
dailyest-# l.sid = 143441 and
dailyest-# l.cid in (36, 39, 6000) and
dailyest-# e.aid = 333710667
dailyest-# GROUP BY e.date
dailyest-# UNION ALL
dailyest-# SELECT e.date, max(e.estimate)
dailyest-# FROM
dailyest-# par_list l,
dailyest-# par_est_2012_07_09 e
dailyest-# WHERE
dailyest-# l.id = e.list_id and
dailyest-# e.date = '2012-07-09' and
dailyest-# l.fid = 1 and
dailyest-# l.sid = 143441 and
dailyest-# l.cid in (36, 39, 6000) and
dailyest-# e.aid = 333710667
dailyest-# GROUP BY e.date
dailyest-# UNION ALL
dailyest-# SELECT e.date, max(e.estimate)
dailyest-# FROM
dailyest-# par_list l,
dailyest-# par_est_2012_07_10 e
dailyest-# WHERE
dailyest-# l.id = e.list_id and
dailyest-# e.date = '2012-07-10' and
dailyest-# l.fid = 1 and
dailyest-# l.sid = 143441 and
dailyest-# l.cid in (36, 39, 6000) and
dailyest-# e.aid = 333710667
dailyest-# GROUP BY e.date
dailyest-# ;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------
------------------------------------------------------
Result (cost=0.00..91.49 rows=3 width=8) (actual time=83.736..254.912
rows=3 loops=1)
Buffers: shared hit=27 read=28
-> Append (cost=0.00..91.49 rows=3 width=8) (actual
time=83.735..254.910 rows=3 loops=1)
Buffers: shared hit=27 read=28
-> GroupAggregate (cost=0.00..30.48 rows=1 width=8) (actual
time=83.735..83.735 rows=1 loops=1)
Buffers: shared hit=9 read=12
-> Nested Loop (cost=0.00..30.47 rows=1 width=8) (actual
time=63.920..83.728 rows=2 loops=1)
Buffers: shared hit=9 read=12
-> Index Scan using par_list_sid_fid_cid_key on
par_list l (cost=0.00..18.56 rows=2 width=4) (actual time=1.540..1.550
rows=2 loops=1)
Index Cond: ((sid = 143441) AND (fid = 1) AND
(cid = ANY ('{36,39,6000}'::integer[])))
Buffers: shared hit=7 read=4
-> Index Only Scan using par_est_2012_07_08_pkey on
par_est_2012_07_08 e (cost=0.00..5.94 rows=1 width=12) (actual
time=41.083..41.083 rows=1 loops=2)
Index Cond: ((date = '2012-07-08'::date) AND
(list_id = l.id) AND (aid = 333710667))
Heap Fetches: 0
Buffers: shared hit=2 read=8
-> GroupAggregate (cost=0.00..30.48 rows=1 width=8) (actual
time=76.911..76.911 rows=1 loops=1)
Buffers: shared hit=9 read=8
-> Nested Loop (cost=0.00..30.47 rows=1 width=8) (actual
time=57.580..76.909 rows=2 loops=1)
Buffers: shared hit=9 read=8
-> Index Scan using par_list_sid_fid_cid_key on
par_list l (cost=0.00..18.56 rows=2 width=4) (actual time=0.007..0.016
rows=2 loops=1)
Index Cond: ((sid = 143441) AND (fid = 1) AND
(cid = ANY ('{36,39,6000}'::integer[])))
Buffers: shared hit=7
-> Index Only Scan using par_est_2012_07_09_pkey on
par_est_2012_07_09 e (cost=0.00..5.94 rows=1 width=12) (actual
time=38.440..38.442 rows=1 loops=2)
Index Cond: ((date = '2012-07-09'::date) AND
(list_id = l.id) AND (aid = 333710667))
Heap Fetches: 0
Buffers: shared hit=2 read=8
-> GroupAggregate (cost=0.00..30.49 rows=1 width=8) (actual
time=94.262..94.262 rows=1 loops=1)
Buffers: shared hit=9 read=8
-> Nested Loop (cost=0.00..30.47 rows=1 width=8) (actual
time=74.393..94.259 rows=2 loops=1)
Buffers: shared hit=9 read=8
-> Index Scan using par_list_sid_fid_cid_key on
par_list l (cost=0.00..18.56 rows=2 width=4) (actual time=0.007..0.017
rows=2 loops=1)
Index Cond: ((sid = 143441) AND (fid = 1) AND
(cid = ANY ('{36,39,6000}'::integer[])))
Buffers: shared hit=7
-> Index Only Scan using par_est_2012_07_10_pkey on
par_est_2012_07_10 e (cost=0.00..5.95 rows=1 width=12) (actual
time=47.116..47.117 rows=1 loops=2)
Index Cond: ((date = '2012-07-10'::date) AND
(list_id = l.id) AND (aid = 333710667))
Heap Fetches: 0
Buffers: shared hit=2 read=8
Total runtime: 255.074 ms
(38 rows)