Markus Schulz
2013-02-22 08:25:23 UTC
hello,
i have a strange and reproducible bug with some select queries and 64bit
postgresql builds (works fine on 32bit builds).
The postgres process will run with 100% cpu-load (no io-wait) and strace
will show endless lseek(..., SEEK_END) calls on one table for minutes.
lseek(28, 0, SEEK_END) = 26697728
lseek(28, 0, SEEK_END) = 26697728
lseek(28, 0, SEEK_END) = 26697728
...
the file-descriptor 28 points to the file for the webapps_base.Acquisition
table (see query/plan below).
Now the details:
The Query:
select count(ac.ID) as col_0_0_ from
webapps_base.Acquisition ac,
webapps_base.SalesPartnerStructure struc
where
struc.fk_SalesPartner_child=ac.fk_SalesPartner_ID
and struc.fk_SalesPartner_parent=200
and (ac.CreationDate between '2012-02-01' and '2013-01-31')
and ac.acquisitiondepot='STANDARD'
and ('2013-01-31' between struc.ValidFrom
and coalesce(struc.ValidTo, '2013-01-31'))
The Plan:
"Aggregate (cost=32617.11..32617.12 rows=1 width=8) (actual time=204.180..204.180 rows=1 loops=1)"
" -> Merge Join (cost=32232.01..32598.26 rows=7543 width=8) (actual time=172.882..202.218 rows=21111 loops=1)"
" Merge Cond: (ac.fk_salespartner_id = struc.fk_salespartner_child)"
" -> Sort (cost=5582.60..5635.69 rows=21235 width=16) (actual time=28.920..31.121 rows=21204 loops=1)"
" Sort Key: ac.fk_salespartner_id"
" Sort Method: quicksort Memory: 1763kB"
" -> Bitmap Heap Scan on acquisition ac (cost=395.26..4056.43 rows=21235 width=16) (actual time=3.064..15.868 rows=21223 loops=1)"
" Recheck Cond: ((creationdate >= '2012-02-01'::date) AND (creationdate <= '2013-01-31'::date))"
" Filter: ((acquisitiondepot)::text = 'STANDARD'::text)"
" -> Bitmap Index Scan on index_acquistion_creationdate (cost=0.00..389.95 rows=21267 width=0) (actual time=2.890..2.890 rows=21514 loops=1)"
" Index Cond: ((creationdate >= '2012-02-01'::date) AND (creationdate <= '2013-01-31'::date))"
" -> Sort (cost=26648.60..26742.61 rows=37606 width=8) (actual time=143.952..152.808 rows=131713 loops=1)"
" Sort Key: struc.fk_salespartner_child"
" Sort Method: quicksort Memory: 8452kB"
" -> Bitmap Heap Scan on salespartnerstructure struc (cost=3976.80..23790.79 rows=37606 width=8) (actual time=13.279..64.681 rows=114772 loops=1)"
" Recheck Cond: (fk_salespartner_parent = 200)"
" Filter: (('2013-01-31'::date >= validfrom) AND ('2013-01-31'::date <= COALESCE(validto, '2013-01-31'::date)))"
" -> Bitmap Index Scan on index_parent_salespartner (cost=0.00..3967.39 rows=114514 width=0) (actual time=13.065..13.065 rows=116479 loops=1)"
" Index Cond: (fk_salespartner_parent = 200)"
"Total runtime: 205.397 ms"
as you can see the query runs fine.
I can run this query from a bash-psql-while-loop/jdbc-cli-tool
endless without any problems.
so far so good.
But now i run the same query from:
JBoss EAP 5.1.2 with connection pooling and xa-datasource/two-phase-commits
(transactions on multiple datasources needed)
*and* *<prepared-statement-cache-size>1000</prepared-statement-cache-size>*
i can run the query four times with good performance and after that postgresql
starts with the strange lseek() behavior.
The query needs more then a minute to complete and during execution the
postgres process runs at 100% cpu load with lseek calls (straced).
If i flush the connection pool (close all open connections from the jboss
jmx-console) it works again for four calls.
These problem applies only to 64bit builds. If i run a 32bit postgresql
server it works fine.
We have tested the following environments:
- Debian Squeeze 64bit with Postgresql 9.1.[5,6,7] -> Bad behavior
- Debian Wheezy 64bit with Postgresql 9.1.8 64bit -> Bad behavior
- Ubuntu 12.04 LTS 64bit with Postgresql 9.1.8 64bit -> Bad behavior
- Windows 7 x64 with Postgresql 9.1.8 64bit -> Bad behavior
- Debian Wheezy 64bit with EnterpriseDB 9.2 64bit -> Bad behavior
- Debian Wheezy 64bit with Postgresql 9.1.8 32bit -> Good behavior
- Debian Wheezy 32bit with Postgresql 9.1.8 32bit -> Good behavior
as you can see all 64bit builds of postgresql are affected (independent from os-arch).
If i disable the prepared-statement-cache-size (remove it from -ds.xml)
it works on 64bit build too.
regards,
msc
i have a strange and reproducible bug with some select queries and 64bit
postgresql builds (works fine on 32bit builds).
The postgres process will run with 100% cpu-load (no io-wait) and strace
will show endless lseek(..., SEEK_END) calls on one table for minutes.
lseek(28, 0, SEEK_END) = 26697728
lseek(28, 0, SEEK_END) = 26697728
lseek(28, 0, SEEK_END) = 26697728
...
the file-descriptor 28 points to the file for the webapps_base.Acquisition
table (see query/plan below).
Now the details:
The Query:
select count(ac.ID) as col_0_0_ from
webapps_base.Acquisition ac,
webapps_base.SalesPartnerStructure struc
where
struc.fk_SalesPartner_child=ac.fk_SalesPartner_ID
and struc.fk_SalesPartner_parent=200
and (ac.CreationDate between '2012-02-01' and '2013-01-31')
and ac.acquisitiondepot='STANDARD'
and ('2013-01-31' between struc.ValidFrom
and coalesce(struc.ValidTo, '2013-01-31'))
The Plan:
"Aggregate (cost=32617.11..32617.12 rows=1 width=8) (actual time=204.180..204.180 rows=1 loops=1)"
" -> Merge Join (cost=32232.01..32598.26 rows=7543 width=8) (actual time=172.882..202.218 rows=21111 loops=1)"
" Merge Cond: (ac.fk_salespartner_id = struc.fk_salespartner_child)"
" -> Sort (cost=5582.60..5635.69 rows=21235 width=16) (actual time=28.920..31.121 rows=21204 loops=1)"
" Sort Key: ac.fk_salespartner_id"
" Sort Method: quicksort Memory: 1763kB"
" -> Bitmap Heap Scan on acquisition ac (cost=395.26..4056.43 rows=21235 width=16) (actual time=3.064..15.868 rows=21223 loops=1)"
" Recheck Cond: ((creationdate >= '2012-02-01'::date) AND (creationdate <= '2013-01-31'::date))"
" Filter: ((acquisitiondepot)::text = 'STANDARD'::text)"
" -> Bitmap Index Scan on index_acquistion_creationdate (cost=0.00..389.95 rows=21267 width=0) (actual time=2.890..2.890 rows=21514 loops=1)"
" Index Cond: ((creationdate >= '2012-02-01'::date) AND (creationdate <= '2013-01-31'::date))"
" -> Sort (cost=26648.60..26742.61 rows=37606 width=8) (actual time=143.952..152.808 rows=131713 loops=1)"
" Sort Key: struc.fk_salespartner_child"
" Sort Method: quicksort Memory: 8452kB"
" -> Bitmap Heap Scan on salespartnerstructure struc (cost=3976.80..23790.79 rows=37606 width=8) (actual time=13.279..64.681 rows=114772 loops=1)"
" Recheck Cond: (fk_salespartner_parent = 200)"
" Filter: (('2013-01-31'::date >= validfrom) AND ('2013-01-31'::date <= COALESCE(validto, '2013-01-31'::date)))"
" -> Bitmap Index Scan on index_parent_salespartner (cost=0.00..3967.39 rows=114514 width=0) (actual time=13.065..13.065 rows=116479 loops=1)"
" Index Cond: (fk_salespartner_parent = 200)"
"Total runtime: 205.397 ms"
as you can see the query runs fine.
I can run this query from a bash-psql-while-loop/jdbc-cli-tool
endless without any problems.
so far so good.
But now i run the same query from:
JBoss EAP 5.1.2 with connection pooling and xa-datasource/two-phase-commits
(transactions on multiple datasources needed)
*and* *<prepared-statement-cache-size>1000</prepared-statement-cache-size>*
i can run the query four times with good performance and after that postgresql
starts with the strange lseek() behavior.
The query needs more then a minute to complete and during execution the
postgres process runs at 100% cpu load with lseek calls (straced).
If i flush the connection pool (close all open connections from the jboss
jmx-console) it works again for four calls.
These problem applies only to 64bit builds. If i run a 32bit postgresql
server it works fine.
We have tested the following environments:
- Debian Squeeze 64bit with Postgresql 9.1.[5,6,7] -> Bad behavior
- Debian Wheezy 64bit with Postgresql 9.1.8 64bit -> Bad behavior
- Ubuntu 12.04 LTS 64bit with Postgresql 9.1.8 64bit -> Bad behavior
- Windows 7 x64 with Postgresql 9.1.8 64bit -> Bad behavior
- Debian Wheezy 64bit with EnterpriseDB 9.2 64bit -> Bad behavior
- Debian Wheezy 64bit with Postgresql 9.1.8 32bit -> Good behavior
- Debian Wheezy 32bit with Postgresql 9.1.8 32bit -> Good behavior
as you can see all 64bit builds of postgresql are affected (independent from os-arch).
If i disable the prepared-statement-cache-size (remove it from -ds.xml)
it works on 64bit build too.
regards,
msc
--
Sent via pgsql-performance mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
Sent via pgsql-performance mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance