SELECT is slow on smaller table?

Discussion:

(too old to reply)

Ao Jianwang

2013-02-28 15:11:16 UTC

Hi,

Does any one can tell me why the same query runs against on smaller data is
slower than bigger table. thanks very much.

I am using PostgreSQL9.1.8.

*t_apps_1 and t_estimate_1 are about 300M respectively, while *_list_1
about 10M more or less. According to the result, it need to read a lot of
blocks(112) from disk.*
explain (ANALYZE ON, BUFFERS ON, verbose on
) SELECT e.t_id, SUM(e.estimate) as est
FROM
t_estimate_list_1 l,
t_apps_list_1 rl,
t_apps_1 r,
t_estimate_1 e
WHERE
l.id = rl.dsf_id and
l.date = '2012-07-01' and
l.fed_id = 202 and
l.st_id = 143464 and
rl.cat_id = 12201 and
l.id = e.list_id and
rl.id = r.list_id and
r.t_id = e.t_id
GROUP BY e.t_id;
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
HashAggregate (cost=2529.91..2530.06 rows=15 width=8) (actual
time=1041.391..1041.409 rows=97 loops=1)
Buffers: shared hit=304 read=112
-> Nested Loop (cost=0.00..2529.84 rows=15 width=8) (actual
time=96.752..1041.145 rows=97 loops=1)
*Buffers: shared hit=304 read=112*
-> Nested Loop (cost=0.00..312.60 rows=242 width=12) (actual
time=62.035..70.239 rows=97 loops=1)
Buffers: shared hit=18 read=10
-> Nested Loop (cost=0.00..16.56 rows=1 width=12) (actual
time=19.520..19.521 rows=1 loops=1)
Buffers: shared hit=3 read=6
-> Index Scan using t_estimate_list_1_unique on
t_estimate_list_1 l (cost=0.00..8.27 rows=1 width=4) (actual
time=11.175..11.176 rows=1 loops=1)
Index Cond: ((date = '2012-07-01'::date) AND
(st_id = 143464) AND (fed_id = 202))
Buffers: shared hit=2 read=4
-> Index Scan using t_apps_list_1_unique on
t_apps_list_1 rl (cost=0.00..8.28 rows=1 width=8) (actual
time=8.339..8.339 rows=1 loops=1)
Index Cond: ((dsf_id = l.id) AND (cat_id =
12201))
Buffers: shared hit=1 read=2
-> Index Scan using t_apps_1_pkey on t_apps_1 r
(cost=0.00..288.56 rows=598 width=8) (actual time=42.513..50.676 rows=97
loops=1)
Index Cond: (list_id = rl.id)
Buffers: shared hit=15 read=4
-> Index Scan using t_estimate_1_pkey on t_estimate_1 e
(cost=0.00..9.15 rows=1 width=12) (actual time=10.006..10.007 rows=1
loops=97)
Index Cond: ((list_id = l.id) AND (t_id = r.t_id))
Buffers: shared hit=286 read=102
* Total runtime: 1041.511 ms*
(21 rows)

*The table *_30 are about 30 times larger than *_1 in the above SQL.
According to the result, it need to read a lot of blocks(22) from disk. *
explain (ANALYZE ON, BUFFERS ON
) SELECT e.t_id, SUM(e.estimate) as est
FROM
t_estimate_list_30 l,
t_apps_list_30 rl,
t_apps_30 r,
t_estimate_30 e
WHERE
l.id = rl.dsf_id and
l.date = '2012-07-01' and
l.fed_id = 202 and
l.st_id = 143464 and
rl.cat_id = 12201 and
l.id = e.list_id and
rl.id = r.list_id and
r.t_id = e.t_id
GROUP BY e.t_id;

QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
HashAggregate (cost=3494.89..3495.04 rows=15 width=8) (actual
time=160.612..160.632 rows=97 loops=1)
Buffers: shared hit=493 read=22
-> Nested Loop (cost=0.00..3494.81 rows=15 width=8) (actual
time=151.183..160.533 rows=97 loops=1)
*Buffers: shared hit=493 read=22*
-> Nested Loop (cost=0.00..431.42 rows=240 width=12) (actual
time=105.810..106.597 rows=97 loops=1)
Buffers: shared hit=20 read=10
-> Nested Loop (cost=0.00..16.58 rows=1 width=12) (actual
time=52.804..52.805 rows=1 loops=1)
Buffers: shared hit=4 read=6
-> Index Scan using t_estimate_list_5_unique on
t_estimate_list_5 l (cost=0.00..8.27 rows=1 width=4) (actual
time=19.846..19.846 rows=1 loops=1)
Index Cond: ((date = '2012-07-01'::date) AND
(st_id = 143464) AND (fed_id = 202))
Buffers: shared hit=2 read=4
-> Index Scan using t_apps_list_5_unique on
t_apps_list_5 rl (cost=0.00..8.30 rows=1 width=8) (actual
time=32.951..32.952 rows=1 loops=1)
Index Cond: ((dsf_id = l.id) AND (cat_id =
12201))
Buffers: shared hit=2 read=2
-> Index Scan using t_apps_5_pkey on t_apps_5 r
(cost=0.00..393.68 rows=1693 width=8) (actual time=53.004..53.755 rows=97
loops=1)
Index Cond: (list_id = rl.id)
Buffers: shared hit=16 read=4
-> Index Scan using t_estimate_5_pkey on t_estimate_5 e
(cost=0.00..12.75 rows=1 width=12) (actual time=0.555..0.555 rows=1
loops=97)
Index Cond: ((list_id = l.id) AND (t_id = r.t_id))
Buffers: shared hit=473 read=12
* Total runtime: 160.729 ms*
(21 rows)

Julien Cigar

2013-02-28 15:19:39 UTC

Permalink

Post by Ao Jianwang
Hi,
Does any one can tell me why the same query runs against on smaller
data is slower than bigger table. thanks very much.
I am using PostgreSQL9.1.8.
*t_apps_1 and t_estimate_1 are about 300M respectively, while *_list_1
about 10M more or less. According to the result, it need to read a lot
of blocks(112) from disk.*
explain (ANALYZE ON, BUFFERS ON, verbose on
) SELECT e.t_id, SUM(e.estimate) as est
FROM
t_estimate_list_1 l,
t_apps_list_1 rl,
t_apps_1 r,
t_estimate_1 e
WHERE
l.id <http://l.id> = rl.dsf_id and
l.date = '2012-07-01' and
l.fed_id = 202 and
l.st_id = 143464 and
rl.cat_id = 12201 and
l.id <http://l.id> = e.list_id and
rl.id <http://rl.id> = r.list_id and
r.t_id = e.t_id
GROUP BY e.t_id;
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
HashAggregate (cost=2529.91..2530.06 rows=15 width=8) (actual
time=1041.391..1041.409 rows=97 loops=1)
Buffers: shared hit=304 read=112
-> Nested Loop (cost=0.00..2529.84 rows=15 width=8) (actual
time=96.752..1041.145 rows=97 loops=1)
*Buffers: shared hit=304 read=112*
-> Nested Loop (cost=0.00..312.60 rows=242 width=12)
(actual time=62.035..70.239 rows=97 loops=1)
Buffers: shared hit=18 read=10
-> Nested Loop (cost=0.00..16.56 rows=1 width=12)
(actual time=19.520..19.521 rows=1 loops=1)
Buffers: shared hit=3 read=6
-> Index Scan using t_estimate_list_1_unique on t_estimate_list_1 l
(cost=0.00..8.27 rows=1 width=4) (actual time=11.175..11.176 rows=1
loops=1)
Index Cond: ((date = '2012-07-01'::date) AND (st_id = 143464) AND
(fed_id = 202))
Buffers: shared hit=2 read=4
-> Index Scan using t_apps_list_1_unique on t_apps_list_1 rl
(cost=0.00..8.28 rows=1 width=8) (actual time=8.339..8.339 rows=1
loops=1)
Index Cond: ((dsf_id = l.id <http://l.id>) AND (cat_id = 12201))
Buffers: shared hit=1 read=2
-> Index Scan using t_apps_1_pkey on t_apps_1 r
(cost=0.00..288.56 rows=598 width=8) (actual time=42.513..50.676
rows=97 loops=1)
Index Cond: (list_id = rl.id <http://rl.id>)
Buffers: shared hit=15 read=4
-> Index Scan using t_estimate_1_pkey on t_estimate_1 e
(cost=0.00..9.15 rows=1 width=12) (actual time=10.006..10.007 rows=1
loops=97)
Index Cond: ((list_id = l.id <http://l.id>) AND (t_id =
r.t_id))
Buffers: shared hit=286 read=102
* Total runtime: 1041.511 ms*
(21 rows)
*The table *_30 are about 30 times larger than *_1 in the above SQL.
According to the result, it need to read a lot of blocks(22) from disk. *
explain (ANALYZE ON, BUFFERS ON
) SELECT e.t_id, SUM(e.estimate) as est
FROM
t_estimate_list_30 l,
t_apps_list_30 rl,
t_apps_30 r,
t_estimate_30 e
WHERE
l.id <http://l.id> = rl.dsf_id and
l.date = '2012-07-01' and
l.fed_id = 202 and
l.st_id = 143464 and
rl.cat_id = 12201 and
l.id <http://l.id> = e.list_id and
rl.id <http://rl.id> = r.list_id and
r.t_id = e.t_id
GROUP BY e.t_id;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
HashAggregate (cost=3494.89..3495.04 rows=15 width=8) (actual
time=160.612..160.632 rows=97 loops=1)
Buffers: shared hit=493 read=22
-> Nested Loop (cost=0.00..3494.81 rows=15 width=8) (actual
time=151.183..160.533 rows=97 loops=1)
*Buffers: shared hit=493 read=22*
-> Nested Loop (cost=0.00..431.42 rows=240 width=12)
(actual time=105.810..106.597 rows=97 loops=1)
Buffers: shared hit=20 read=10
-> Nested Loop (cost=0.00..16.58 rows=1 width=12)
(actual time=52.804..52.805 rows=1 loops=1)
Buffers: shared hit=4 read=6
-> Index Scan using t_estimate_list_5_unique on t_estimate_list_5 l
(cost=0.00..8.27 rows=1 width=4) (actual time=19.846..19.846 rows=1
loops=1)
Index Cond: ((date = '2012-07-01'::date) AND (st_id = 143464) AND
(fed_id = 202))
Buffers: shared hit=2 read=4
-> Index Scan using t_apps_list_5_unique on t_apps_list_5 rl
(cost=0.00..8.30 rows=1 width=8) (actual time=32.951..32.952 rows=1
loops=1)
Index Cond: ((dsf_id = l.id <http://l.id>) AND (cat_id = 12201))
Buffers: shared hit=2 read=2
-> Index Scan using t_apps_5_pkey on t_apps_5 r
(cost=0.00..393.68 rows=1693 width=8) (actual time=53.004..53.755
rows=97 loops=1)
Index Cond: (list_id = rl.id <http://rl.id>)
Buffers: shared hit=16 read=4
-> Index Scan using t_estimate_5_pkey on t_estimate_5 e
(cost=0.00..12.75 rows=1 width=12) (actual time=0.555..0.555 rows=1
loops=97)
Index Cond: ((list_id = l.id <http://l.id>) AND (t_id =
r.t_id))
Buffers: shared hit=473 read=12
* Total runtime: 160.729 ms*
(21 rows)

Probably that somes pages have to be loaded in memory ...
It should be faster if you re-run the same query just after

--
No trees were killed in the creation of this message.
However, many electrons were terribly inconvenienced.

Ao Jianwang

2013-03-01 00:30:09 UTC

Permalink

Thanks Julien very much.
Two strange behaviors I found:
1) Even I restart the machine and restart the PostgreSQL, then I execute
the query, i still see the shared_hit. It seems when start PG, i will
automatically load the data in the cache of the last time?
2) After I rerun the query, the time for the smaller data is about 19ms,
while the time for the bigger data is about 17ms. And the trend is the time
for bigger data is always faster than the smaller data for about 1 to 2 ms

Any suggestions? thanks very much.

Post by Ao Jianwang
Hi,
Does any one can tell me why the same query runs against on smaller data
is slower than bigger table. thanks very much.
I am using PostgreSQL9.1.8.
*t_apps_1 and t_estimate_1 are about 300M respectively, while *_list_1
about 10M more or less. According to the result, it need to read a lot of
blocks(112) from disk.*
explain (ANALYZE ON, BUFFERS ON, verbose on
) SELECT e.t_id, SUM(e.estimate) as est
FROM
t_estimate_list_1 l,
t_apps_list_1 rl,
t_apps_1 r,
t_estimate_1 e
WHERE
l.id = rl.dsf_id and
l.date = '2012-07-01' and
l.fed_id = 202 and
l.st_id = 143464 and
rl.cat_id = 12201 and
l.id = e.list_id and
rl.id = r.list_id and
r.t_id = e.t_id
GROUP BY e.t_id;
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
HashAggregate (cost=2529.91..2530.06 rows=15 width=8) (actual
time=1041.391..1041.409 rows=97 loops=1)
Buffers: shared hit=304 read=112
-> Nested Loop (cost=0.00..2529.84 rows=15 width=8) (actual
time=96.752..1041.145 rows=97 loops=1)
*Buffers: shared hit=304 read=112*
-> Nested Loop (cost=0.00..312.60 rows=242 width=12) (actual
time=62.035..70.239 rows=97 loops=1)
Buffers: shared hit=18 read=10
-> Nested Loop (cost=0.00..16.56 rows=1 width=12) (actual
time=19.520..19.521 rows=1 loops=1)
Buffers: shared hit=3 read=6
-> Index Scan using t_estimate_list_1_unique on
t_estimate_list_1 l (cost=0.00..8.27 rows=1 width=4) (actual
time=11.175..11.176 rows=1 loops=1)
Index Cond: ((date = '2012-07-01'::date) AND
(st_id = 143464) AND (fed_id = 202))
Buffers: shared hit=2 read=4
-> Index Scan using t_apps_list_1_unique on
t_apps_list_1 rl (cost=0.00..8.28 rows=1 width=8) (actual
time=8.339..8.339 rows=1 loops=1)
Index Cond: ((dsf_id = l.id) AND (cat_id =
12201))
Buffers: shared hit=1 read=2
-> Index Scan using t_apps_1_pkey on t_apps_1 r
(cost=0.00..288.56 rows=598 width=8) (actual time=42.513..50.676 rows=97
loops=1)
Index Cond: (list_id = rl.id)
Buffers: shared hit=15 read=4
-> Index Scan using t_estimate_1_pkey on t_estimate_1 e
(cost=0.00..9.15 rows=1 width=12) (actual time=10.006..10.007 rows=1
loops=97)
Index Cond: ((list_id = l.id) AND (t_id = r.t_id))
Buffers: shared hit=286 read=102
* Total runtime: 1041.511 ms*
(21 rows)
*The table *_30 are about 30 times larger than *_1 in the above SQL.
According to the result, it need to read a lot of blocks(22) from disk. *
explain (ANALYZE ON, BUFFERS ON
) SELECT e.t_id, SUM(e.estimate) as est
FROM
t_estimate_list_30 l,
t_apps_list_30 rl,
t_apps_30 r,
t_estimate_30 e
WHERE
l.id = rl.dsf_id and
l.date = '2012-07-01' and
l.fed_id = 202 and
l.st_id = 143464 and
rl.cat_id = 12201 and
l.id = e.list_id and
rl.id = r.list_id and
r.t_id = e.t_id
GROUP BY e.t_id;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
HashAggregate (cost=3494.89..3495.04 rows=15 width=8) (actual
time=160.612..160.632 rows=97 loops=1)
Buffers: shared hit=493 read=22
-> Nested Loop (cost=0.00..3494.81 rows=15 width=8) (actual
time=151.183..160.533 rows=97 loops=1)
*Buffers: shared hit=493 read=22*
-> Nested Loop (cost=0.00..431.42 rows=240 width=12) (actual
time=105.810..106.597 rows=97 loops=1)
Buffers: shared hit=20 read=10
-> Nested Loop (cost=0.00..16.58 rows=1 width=12) (actual
time=52.804..52.805 rows=1 loops=1)
Buffers: shared hit=4 read=6
-> Index Scan using t_estimate_list_5_unique on
t_estimate_list_5 l (cost=0.00..8.27 rows=1 width=4) (actual
time=19.846..19.846 rows=1 loops=1)
Index Cond: ((date = '2012-07-01'::date) AND
(st_id = 143464) AND (fed_id = 202))
Buffers: shared hit=2 read=4
-> Index Scan using t_apps_list_5_unique on
t_apps_list_5 rl (cost=0.00..8.30 rows=1 width=8) (actual
time=32.951..32.952 rows=1 loops=1)
Index Cond: ((dsf_id = l.id) AND (cat_id =
12201))
Buffers: shared hit=2 read=2
-> Index Scan using t_apps_5_pkey on t_apps_5 r
(cost=0.00..393.68 rows=1693 width=8) (actual time=53.004..53.755 rows=97
loops=1)
Index Cond: (list_id = rl.id)
Buffers: shared hit=16 read=4
-> Index Scan using t_estimate_5_pkey on t_estimate_5 e
(cost=0.00..12.75 rows=1 width=12) (actual time=0.555..0.555 rows=1
loops=97)
Index Cond: ((list_id = l.id) AND (t_id = r.t_id))
Buffers: shared hit=473 read=12
* Total runtime: 160.729 ms*
(21 rows)
Probably that somes pages have to be loaded in memory ...
It should be faster if you re-run the same query just after
--
No trees were killed in the creation of this message.
However, many electrons were terribly inconvenienced.