Server stalls, all CPU 100% system time

Discussion:

(too old to reply)

Andre

2013-02-24 14:08:03 UTC

Hi,
Since our upgrade of hardware, OS and Postgres we experience server stalls under certain conditions, during that time (up to 2 minutes) all CPUs show 100% system time. All Postgres processes show BIND in top.
Usually the server only has a load of < 0.5 (12 cores) with up to 30 connections, 200-400 tps

Here is top -H during the stall:
Threads: 279 total, 25 running, 254 sleeping, 0 stopped, 0 zombie
%Cpu(s): 0.2 us, 99.8 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st

This is under normal circumstances:
Threads: 274 total, 1 running, 273 sleeping, 0 stopped, 0 zombie
%Cpu(s): 0.2 us, 0.2 sy, 0.0 ni, 99.6 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st

iostat shows under 0.3% load on the drives.

The stalls are mostly reproducible when there is the normal load on the server and then 20-40 new processes start executing SQLs.
Deactivating HT seemed to have reduced the frequency and length of the stalls.

The log shows entries for slow BINDs (8 seconds):
... LOG: duration: 8452.654 ms bind pdo_stmt_00000001: SELECT [20 columns selected] FROM users WHERE users.USERID=$1 LIMIT 1

I have tried to create a testcase, but even starting 200 client processes that execute prepared statements does not reproduce this behaviour on a nearly idle server, only under normal workload does it stall.

Hardware details:
2x Intel(R) Xeon(R) CPU E5-2640 0 @ 2.50GHz
64 GB RAM

Postgres version: 9.2.2 and 9.2.3

Linux: OpenSUSE 12.2 with Kernel 3.4.6

Postgres config:
max_connections = 200
effective_io_concurrency = 3
max_wal_senders = 2
wal_keep_segments = 2048
max_locks_per_transaction = 500
default_statistics_target = 100
checkpoint_completion_target = 0.9
maintenance_work_mem = 1GB
effective_cache_size = 60GB
work_mem = 384MB
wal_buffers = 8MB
checkpoint_segments = 64
shared_buffers = 15GB

This might be related to this topic: http://www.postgresql.org/message-id/CANQNgOquOGH7AkqW6ObPafrgxv=J3WsiZg-NgVvbki-***@mail.gmail.com (Poor performance after update from SLES11 SP1 to SP2)
I believe the old server was OpenSUSE 11.x.

Thanks for any hint on how to fix this or diagnose the problem.

--
Sent via pgsql-performance mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Bèrto ëd Sèra

2013-02-24 14:13:13 UTC

Permalink

and your /etc/sysctl.conf is?

Cheers
Bèrto

Post by Andre
Hi,
Since our upgrade of hardware, OS and Postgres we experience server stalls
under certain conditions, during that time (up to 2 minutes) all CPUs show
100% system time. All Postgres processes show BIND in top.
Usually the server only has a load of < 0.5 (12 cores) with up to 30
connections, 200-400 tps
Threads: 279 total, 25 running, 254 sleeping, 0 stopped, 0 zombie
%Cpu(s): 0.2 us, 99.8 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
Threads: 274 total, 1 running, 273 sleeping, 0 stopped, 0 zombie
%Cpu(s): 0.2 us, 0.2 sy, 0.0 ni, 99.6 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
iostat shows under 0.3% load on the drives.
The stalls are mostly reproducible when there is the normal load on the
server and then 20-40 new processes start executing SQLs.
Deactivating HT seemed to have reduced the frequency and length of the stalls.
... LOG: duration: 8452.654 ms bind pdo_stmt_00000001: SELECT [20 columns
selected] FROM users WHERE users.USERID=$1 LIMIT 1
I have tried to create a testcase, but even starting 200 client processes
that execute prepared statements does not reproduce this behaviour on a
nearly idle server, only under normal workload does it stall.
64 GB RAM
Postgres version: 9.2.2 and 9.2.3
Linux: OpenSUSE 12.2 with Kernel 3.4.6
max_connections = 200
effective_io_concurrency = 3
max_wal_senders = 2
wal_keep_segments = 2048
max_locks_per_transaction = 500
default_statistics_target = 100
checkpoint_completion_target = 0.9
maintenance_work_mem = 1GB
effective_cache_size = 60GB
work_mem = 384MB
wal_buffers = 8MB
checkpoint_segments = 64
shared_buffers = 15GB
(Poor performance after update from SLES11 SP1 to SP2)
I believe the old server was OpenSUSE 11.x.
Thanks for any hint on how to fix this or diagnose the problem.
--
http://www.postgresql.org/mailpref/pgsql-performance

--
==============================
If Pac-Man had affected us as kids, we'd all be running around in a
darkened room munching pills and listening to repetitive music.

--
Sent via pgsql-performance mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Andre

2013-02-24 14:39:34 UTC

Permalink

Post by BÃ¨rto Ã«d SÃ¨ra
and your /etc/sysctl.conf is?

/etc/sysctl.conf only had unrelated options set, here is the output of sysctl -a:

abi.vsyscall32 = 1
debug.exception-trace = 1
dev.hpet.max-user-freq = 64
kernel.acct = 4 2 30
kernel.acpi_video_flags = 0
kernel.auto_msgmni = 1
kernel.blk_iopoll = 1
kernel.cad_pid = 1
kernel.cap_last_cap = 35
kernel.compat-log = 1
kernel.core_pattern = core
kernel.core_pipe_limit = 0
kernel.core_uses_pid = 0
kernel.ctrl-alt-del = 0
kernel.dmesg_restrict = 0
kernel.hung_task_check_count = 4194304
kernel.hung_task_panic = 0
kernel.hung_task_timeout_secs = 0
kernel.hung_task_warnings = 10
kernel.io_delay_type = 0
kernel.keys.gc_delay = 300
kernel.keys.maxbytes = 20000
kernel.keys.maxkeys = 200
kernel.keys.root_maxbytes = 20000
kernel.keys.root_maxkeys = 200
kernel.kptr_restrict = 0
kernel.kstack_depth_to_print = 12
kernel.latencytop = 0
kernel.max_lock_depth = 1024
kernel.msgmax = 65536
kernel.msgmnb = 65536
kernel.msgmni = 32768
kernel.ngroups_max = 65536
kernel.nmi_watchdog = 1
kernel.ns_last_pid = 14714
kernel.osrelease = 3.4.6-2.10-desktop
kernel.ostype = Linux
kernel.overflowgid = 65534
kernel.overflowuid = 65534
kernel.panic = 0
kernel.panic_on_io_nmi = 0
kernel.panic_on_oops = 0
kernel.panic_on_unrecovered_nmi = 0
kernel.perf_event_max_sample_rate = 100000
kernel.perf_event_mlock_kb = 516
kernel.perf_event_paranoid = 1
kernel.pid_max = 32768
kernel.poweroff_cmd = /sbin/poweroff
kernel.print-fatal-signals = 0
kernel.printk = 1 4 1 7
kernel.printk_delay = 0
kernel.printk_ratelimit = 5
kernel.printk_ratelimit_burst = 10
kernel.random.boot_id = eaaed4b5-58cb-4f1d-be4a-62475d6ba312
kernel.random.entropy_avail = 2488
kernel.random.poolsize = 4096
kernel.random.read_wakeup_threshold = 64
kernel.random.uuid = bc40e670-c507-4b31-a310-f80eabda46b7
kernel.random.write_wakeup_threshold = 1024
kernel.randomize_va_space = 2
kernel.sched_autogroup_enabled = 1
kernel.sched_cfs_bandwidth_slice_us = 5000
kernel.sched_child_runs_first = 0
kernel.sched_domain.cpu0.domain0.busy_factor = 64
kernel.sched_domain.cpu0.domain0.busy_idx = 2
kernel.sched_domain.cpu0.domain0.cache_nice_tries = 1
kernel.sched_domain.cpu0.domain0.flags = 4655
kernel.sched_domain.cpu0.domain0.forkexec_idx = 0
kernel.sched_domain.cpu0.domain0.idle_idx = 0
kernel.sched_domain.cpu0.domain0.imbalance_pct = 125
kernel.sched_domain.cpu0.domain0.max_interval = 4
kernel.sched_domain.cpu0.domain0.min_interval = 1
kernel.sched_domain.cpu0.domain0.name = MC
kernel.sched_domain.cpu0.domain0.newidle_idx = 0
kernel.sched_domain.cpu0.domain0.wake_idx = 0
...
kernel.sched_latency_ns = 24000000
kernel.sched_migration_cost = 500000
kernel.sched_min_granularity_ns = 3000000
kernel.sched_nr_migrate = 32
kernel.sched_rt_period_us = 1000000
kernel.sched_rt_runtime_us = 950000
kernel.sched_shares_window = 10000000
kernel.sched_time_avg = 1000
kernel.sched_tunable_scaling = 1
kernel.sched_wakeup_granularity_ns = 4000000
kernel.sem = 250 256000 32 1024
kernel.shm_rmid_forced = 0
kernel.shmall = 1152921504606846720
kernel.shmmax = 18446744073709551615
kernel.shmmni = 4096
kernel.softlockup_panic = 0
kernel.suid_dumpable = 0
kernel.sysrq = 0
kernel.tainted = 1536
kernel.threads-max = 1032123
kernel.timer_migration = 1
kernel.unknown_nmi_panic = 0
kernel.usermodehelper.bset = 4294967295 4294967295
kernel.usermodehelper.inheritable = 4294967295 4294967295
kernel.version = #1 SMP PREEMPT Thu Jul 26 09:36:26 UTC 2012 (641c197)
kernel.watchdog = 1
kernel.watchdog_thresh = 10
vm.block_dump = 0
vm.dirty_background_bytes = 0
vm.dirty_background_ratio = 10
vm.dirty_bytes = 0
vm.dirty_expire_centisecs = 3000
vm.dirty_ratio = 20
vm.dirty_writeback_centisecs = 500
vm.drop_caches = 0
vm.extfrag_threshold = 500
vm.hugepages_treat_as_movable = 0
vm.hugetlb_shm_group = 0
vm.laptop_mode = 0
vm.legacy_va_layout = 0
vm.lowmem_reserve_ratio = 256 256 32
vm.max_map_count = 65530
vm.memory_failure_early_kill = 0
vm.memory_failure_recovery = 1
vm.min_free_kbytes = 67584
vm.min_slab_ratio = 5
vm.min_unmapped_ratio = 1
vm.mmap_min_addr = 65536
vm.nr_hugepages = 0
vm.nr_hugepages_mempolicy = 0
vm.nr_overcommit_hugepages = 0
vm.nr_pdflush_threads = 0
vm.numa_zonelist_order = default
vm.oom_dump_tasks = 1
vm.oom_kill_allocating_task = 0
vm.overcommit_memory = 0
vm.overcommit_ratio = 50
vm.page-cluster = 3
vm.panic_on_oom = 0
vm.percpu_pagelist_fraction = 0
vm.scan_unevictable_pages = 0
vm.stat_interval = 1
vm.swappiness = 60
vm.vfs_cache_pressure = 100
vm.zone_reclaim_mode = 0

--
Sent via pgsql-performance mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Tom Lane

2013-02-24 14:45:59 UTC

Permalink

Post by Andre
Since our upgrade of hardware, OS and Postgres we experience server stalls under certain conditions, during that time (up to 2 minutes) all CPUs show 100% system time. All Postgres processes show BIND in top.

Odd. I wonder if you are seeing some variant of the old context swap
storm problem. The "99.8% system time" reading is suggestive but hardly
conclusive. Does top's report of context swap rate go to the moon?

It would be interesting to strace a few of the server processes while
one of these events is happening, too.

regards, tom lane

--
Sent via pgsql-performance mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Andre

2013-02-26 12:53:17 UTC

Permalink

Odd. I wonder if you are seeing some variant of the old context swap storm problem. The "99.8% system time" reading is suggestive but hardly conclusive. Does top's report of context swap rate go to the moon? It would be interesting to strace a few of the server processes while one of these events is happening, too. regards, tom lane

I used vmstat to look at the context swaps, they were around 5k and 15k interrupts per second.
I thought that it was to many interrupts and after a bit of search a website mentioned that the network card driver could cause that. After updating kernel and the driver the stalling is not reproducible any more.

Weird enough, when I load test the server now I have 35k interrupts and 250k context switches, but no problems at all.

Thanks for pointing me into the right direction.

--
Sent via pgsql-performance mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Scott Marlowe

2013-02-24 16:43:35 UTC

Permalink

It might be useful to see a page of top output as well. Further turn
on sysstat data collection so you do some post mortem work.

--
Sent via pgsql-performance mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Josh Berkus

2013-02-26 23:29:12 UTC

Permalink

Andre,

Please see the related thread on this list, "High CPU usage / load
average after upgrading to Ubuntu 12.04". You may be experiencing some
of the same issues. General perspective seems to be that kernels 3.0
through 3.4 have serious performance issues.
--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

--
Sent via pgsql-performance mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Scott Marlowe

2013-02-27 01:02:49 UTC

Permalink

Post by Josh Berkus
Andre,
Please see the related thread on this list, "High CPU usage / load
average after upgrading to Ubuntu 12.04". You may be experiencing some
of the same issues. General perspective seems to be that kernels 3.0
through 3.4 have serious performance issues.

Someone commented they think it might be related to this kernel bug:

https://lkml.org/lkml/2012/10/9/210

--
Sent via pgsql-performance mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Josh Berkus

2013-02-27 23:07:27 UTC

Permalink

Post by Scott Marlowe
https://lkml.org/lkml/2012/10/9/210

We have some evidence that that is the case.
--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

--
Sent via pgsql-performance mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Andre

2013-02-27 10:52:44 UTC

Permalink

Josh,

I saw that thread, but it did not appear to be the same symptoms that I had. Where they have a high load average, I only saw spikes during which the server was unresponsive. During that time the load would jump to 50-70 (on 24 cores).
Anyway, after upgrading the Kernel to 3.4.28 and the latest Intel network card driver the problem seems to be gone.

--
Sent via pgsql-performance mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance