Mark Smith
2013-02-21 09:59:01 UTC
Hardware: IBM X3650 M3 (2 x Xeon X5680 6C 3.33GHz), 96GB RAM. IBM X3524
with RAID 10 ext4 (noatime,nodiratime,data=writeback,barrier=0) volumes for
pg_xlog / data / indexes.
Software: SLES 11 SP2 3.0.58-0.6.2-default x86_64, PostgreSQL 9.0.4.
max_connections = 1500
shared_buffers = 16GB
work_mem = 64MB
maintenance_work_mem = 256MB
wal_level = archive
synchronous_commit = off
wal_buffers = 16MB
checkpoint_segments = 32
checkpoint_completion_target = 0.9
effective_cache_size = 32GB
Workload: OLTP, typically with 500+ concurrent database connections, same
Linux instance is also used as web server and application server. Far from
ideal but has worked well for 15 months.
Problem: We have been running PostgreSQL 9.0.4 on SLES11 SP1, last kernel
in use was 2.6.32-43-0.4, performance has always been great. Since updating
from SLES11 SP1 to SP2 we now experience many database 'stalls' (e.g.
normally 'instant' queries taking many seconds, any query will be slow,
just connecting to the database will be slow). We have trialled PostgreSQL
9.2.3 under SLES11 SP2 with the exact same results. During these periods
the machine is completely responsive but anything accessing the database is
extremely slow.
I have tried increasing sched_migration_cost from 500000 to 5000000 and
also tried setting sched_compat_yield to 1, neither of these appeared to
make a difference. I don't have the parameter 'sched_autogroup_enabled'.
Nothing jumps out from top/iostat/sar/pg_stat_activity however I am very
far from expert in interpreting their output
We have work underway to reduce our number of connections as although it
has always worked ok, perhaps it makes us particularly vulnerable to
kernel/scheduler changes.
I would be very grateful for any suggestions as to the best way to diagnose
the source of this problem and/or general recommendations?
with RAID 10 ext4 (noatime,nodiratime,data=writeback,barrier=0) volumes for
pg_xlog / data / indexes.
Software: SLES 11 SP2 3.0.58-0.6.2-default x86_64, PostgreSQL 9.0.4.
max_connections = 1500
shared_buffers = 16GB
work_mem = 64MB
maintenance_work_mem = 256MB
wal_level = archive
synchronous_commit = off
wal_buffers = 16MB
checkpoint_segments = 32
checkpoint_completion_target = 0.9
effective_cache_size = 32GB
Workload: OLTP, typically with 500+ concurrent database connections, same
Linux instance is also used as web server and application server. Far from
ideal but has worked well for 15 months.
Problem: We have been running PostgreSQL 9.0.4 on SLES11 SP1, last kernel
in use was 2.6.32-43-0.4, performance has always been great. Since updating
from SLES11 SP1 to SP2 we now experience many database 'stalls' (e.g.
normally 'instant' queries taking many seconds, any query will be slow,
just connecting to the database will be slow). We have trialled PostgreSQL
9.2.3 under SLES11 SP2 with the exact same results. During these periods
the machine is completely responsive but anything accessing the database is
extremely slow.
I have tried increasing sched_migration_cost from 500000 to 5000000 and
also tried setting sched_compat_yield to 1, neither of these appeared to
make a difference. I don't have the parameter 'sched_autogroup_enabled'.
Nothing jumps out from top/iostat/sar/pg_stat_activity however I am very
far from expert in interpreting their output
We have work underway to reduce our number of connections as although it
has always worked ok, perhaps it makes us particularly vulnerable to
kernel/scheduler changes.
I would be very grateful for any suggestions as to the best way to diagnose
the source of this problem and/or general recommendations?