Do I have a hardware or a software problem?

Discussion:

(too old to reply)

Niels Kristian Schjødt

2012-12-10 22:51:58 UTC

#### Pitch ######################################################################################
I previously posted this question http://archives.postgresql.org/pgsql-performance/2012-11/msg00289.php about a performance issue with an update query.
The question evolved into a more general discussion about my setup, and about a lot of I/O wait that I was encountering. Since then, I have gotten a whole lot more familiar with measuring things, and now I "just" need some experienced eyes to judge which direction I should go in - do I have a hardware issue, or a software issue - and what action should I take?

##### My setup #############################################################################
The use case:
At night time we are doing a LOT of data maintenance, and hence the load on the database is very different from the day time. However we would like to be able to do some of it in the daytime, it's simply just too "heavy" on the database as is right now. The stats shown below is from one of those "heavy" load times.

Hardware:
- 32Gb ram
- 8 core Xeon E3-1245 processor
- Two SEAGATE ST33000650NS drives (called sdc and sdd in the stats) in a softeware RAID1 array (called md2 in the stats)
- Two INTEL SSDSC2CW240A3 SSD drives (called sda and sdb in the stats) in a software RAID1 (called md3 in the stats)

Software:
Postgres 9.2 running on 64bit ubuntu 12.04 with kernel 3.2

Configuration:
# postgresql.conf (a shortlist of everything changed from the default)
data_directory = '/var/lib/postgresql/9.2/main'
hba_file = '/etc/postgresql/9.2/main/pg_hba.conf'
ident_file = '/etc/postgresql/9.2/main/pg_ident.conf'
external_pid_file = '/var/run/postgresql/9.2-main.pid'
listen_addresses = '192.168.0.2, localhost'
port = 5432
max_connections = 300
unix_socket_directory = '/var/run/postgresql'
wal_level = hot_standby
synchronous_commit = off
archive_mode = on
archive_command = 'rsync -a %p ***@192.168.0.4:/var/lib/postgresql/9.2/wals/%f </dev/null'
max_wal_senders = 1
wal_keep_segments = 32
log_line_prefix = '%t [%p]: [%l-1] user=%u,db=%d '
datestyle = 'iso, mdy'
lc_monetary = 'en_US.UTF-8'
lc_numeric = 'en_US.UTF-8'
lc_time = 'en_US.UTF-8'
default_text_search_config = 'pg_catalog.english'
default_statistics_target = 100
maintenance_work_mem = 1GB
checkpoint_completion_target = 0.9
effective_cache_size = 22GB
work_mem = 160MB
wal_buffers = 4MB
checkpoint_segments = 100
shared_buffers = 4GB
checkpoint_timeout = 10min

The kernel has bee tweaked like so:
vm.dirty_ratio = 10
vm.dirty_background_ratio = 1
kernel.shmmax = 8589934592
kernel.shmall = 17179869184

The pg_xlog folder has been moved onto the SSD array (md3), and symlinked back into the postgres dir.

##### The stats ###############################################################
These are the typical observations/stats I see in one of these periods:

1)
At top level this is what I see in new relic:
https://rpm.newrelic.com/public/charts/6ewGRle6bmc

2)
When the database is loaded like this, I see a lot of queries talking up to 1000 times as long, as they would when the database is not loaded so heavily.

3)
sudo iostat -dmx (typical usage)
Linux 3.2.0-33-generic (master-db) 12/10/2012 _x86_64_ (8 CPU)

Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
sda 0.00 6.52 3.59 26.61 0.22 0.74 65.49 0.01 0.40 0.77 0.35 0.14 0.43
sdb 0.00 8.31 0.03 28.38 0.00 0.97 69.63 0.01 0.52 0.27 0.52 0.15 0.43
sdc 1.71 46.01 34.83 116.62 0.56 4.06 62.47 1.90 12.57 21.81 9.81 1.89 28.66
sdd 1.67 46.14 34.89 116.49 0.56 4.06 62.46 1.58 10.43 21.66 7.07 1.89 28.60
md1 0.00 0.00 0.00 0.00 0.00 0.00 2.69 0.00 0.00 0.00 0.00 0.00 0.00
md0 0.00 0.00 0.11 0.24 0.00 0.00 8.00 0.00 0.00 0.00 0.00 0.00 0.00
md2 0.00 0.00 72.99 161.95 1.11 4.06 45.10 0.00 0.00 0.00 0.00 0.00 0.00
md3 0.00 0.00 0.05 32.32 0.00 0.74 47.00 0.00 0.00 0.00 0.00 0.00 0.00

3)
sudo iotop -oa (running for about a minute or so)
TID PRIO USER DISK READ DISK WRITE SWAPIN IO> COMMAND
292 be/4 root 0.00 B 0.00 B 0.00 % 99.33 % [md2_raid1]
2815 be/4 postgres 19.51 M 25.90 M 0.00 % 45.49 % postgres: autovacuum worker process production
32553 be/4 postgres 45.74 M 9.38 M 0.00 % 37.89 % postgres: user production 192.168.0.3(58866) UPDATE
32570 be/4 postgres 6.91 M 35.02 M 0.00 % 16.71 % postgres: user production 192.168.0.3(35547) idle
32575 be/4 postgres 4.06 M 43.90 M 0.00 % 16.62 % postgres: user production 192.168.0.3(35561) SELECT
31673 be/4 postgres 4.14 M 52.16 M 0.00 % 16.24 % postgres: user production 192.168.0.3(39112) idle
32566 be/4 postgres 4.73 M 44.95 M 0.00 % 15.66 % postgres: user production 192.168.0.3(35531) idle
32568 be/4 postgres 4.50 M 33.84 M 0.00 % 14.62 % postgres: user production 192.168.0.3(35543) SELECT
32573 be/4 postgres 3.20 M 34.44 M 0.00 % 13.98 % postgres: user production 192.168.0.3(35559) idle
31590 be/4 postgres 3.23 M 29.72 M 0.00 % 13.90 % postgres: user production 192.168.0.3(50690) idle in transaction
32577 be/4 postgres 5.09 M 25.54 M 0.00 % 13.63 % postgres: user production 192.168.0.3(35563) idle
32565 be/4 postgres 2.06 M 35.93 M 0.00 % 13.41 % postgres: user production 192.168.0.3(35529) SELECT
32546 be/4 postgres 4.48 M 36.49 M 0.00 % 13.39 % postgres: user production 192.168.0.3(56927) UPDATE waiting
32569 be/4 postgres 3.50 M 26.75 M 0.00 % 12.82 % postgres: user production 192.168.0.3(35545) INSERT
31671 be/4 postgres 4.58 M 24.45 M 0.00 % 12.76 % postgres: user production 192.168.0.3(34841) idle in transaction
32551 be/4 postgres 3.26 M 31.77 M 0.00 % 12.06 % postgres: user production 192.168.0.3(58864) idle in transaction
32574 be/4 postgres 5.32 M 32.92 M 0.00 % 11.70 % postgres: user production 192.168.0.3(35560) idle
32572 be/4 postgres 3.00 M 32.66 M 0.00 % 11.66 % postgres: user production 192.168.0.3(35558) UPDATE
32560 be/4 postgres 5.12 M 25.89 M 0.00 % 11.52 % postgres: user production 192.168.0.3(33886) SELECT
32567 be/4 postgres 4.66 M 36.47 M 0.00 % 11.44 % postgres: user production 192.168.0.3(35534) SELECT
32571 be/4 postgres 2.86 M 31.27 M 0.00 % 11.31 % postgres: user production 192.168.0.3(35557) SELECT
32552 be/4 postgres 4.38 M 28.75 M 0.00 % 11.09 % postgres: user production 192.168.0.3(58865) idle in transaction
32554 be/4 postgres 3.69 M 30.21 M 0.00 % 10.90 % postgres: user production 192.168.0.3(58870) UPDATE
339 be/3 root 0.00 B 2.29 M 0.00 % 9.81 % [jbd2/md2-8]
32576 be/4 postgres 3.37 M 19.91 M 0.00 % 9.73 % postgres: user production 192.168.0.3(35562) idle
32555 be/4 postgres 3.09 M 31.96 M 0.00 % 9.02 % postgres: user production 192.168.0.3(58875) SELECT
27548 be/4 postgres 0.00 B 97.12 M 0.00 % 7.41 % postgres: writer process
31445 be/4 postgres 924.00 K 27.35 M 0.00 % 7.11 % postgres: user production 192.168.0.1(34536) idle
31443 be/4 postgres 2.54 M 4.56 M 0.00 % 6.32 % postgres: user production 192.168.0.1(34508) idle
31459 be/4 postgres 1480.00 K 21.36 M 0.00 % 5.63 % postgres: user production 192.168.0.1(34543) idle
1801 be/4 postgres 1896.00 K 10.89 M 0.00 % 5.57 % postgres: user production 192.168.0.3(34177) idle
32763 be/4 postgres 1696.00 K 6.95 M 0.00 % 5.33 % postgres: user production 192.168.0.3(57984) SELECT
1800 be/4 postgres 2.46 M 5.13 M 0.00 % 5.24 % postgres: user production 192.168.0.3(34175) SELECT
1803 be/4 postgres 1816.00 K 9.09 M 0.00 % 5.16 % postgres: user production 192.168.0.3(34206) idle
32578 be/4 postgres 2.57 M 11.62 M 0.00 % 5.06 % postgres: user production 192.168.0.3(35564) SELECT
31440 be/4 postgres 3.02 M 4.04 M 0.00 % 4.65 % postgres: user production 192.168.0.1(34463) idle
32605 be/4 postgres 1844.00 K 11.82 M 0.00 % 4.49 % postgres: user production 192.168.0.3(40399) idle
27547 be/4 postgres 0.00 B 0.00 B 0.00 % 3.93 % postgres: checkpointer process
31356 be/4 postgres 1368.00 K 3.27 M 0.00 % 3.93 % postgres: user production 192.168.0.1(34450) idle
32542 be/4 postgres 1180.00 K 6.05 M 0.00 % 3.90 % postgres: user production 192.168.0.3(56859) idle
32523 be/4 postgres 1088.00 K 4.33 M 0.00 % 3.59 % postgres: user production 192.168.0.3(48164) idle
32606 be/4 postgres 1964.00 K 6.94 M 0.00 % 3.51 % postgres: user production 192.168.0.3(40426) SELECT
31466 be/4 postgres 1596.00 K 3.11 M 0.00 % 3.47 % postgres: user production 192.168.0.1(34550) idle
32544 be/4 postgres 1184.00 K 4.25 M 0.00 % 3.38 % postgres: user production 192.168.0.3(56861) idle
31458 be/4 postgres 1088.00 K 1528.00 K 0.00 % 3.33 % postgres: user production 192.168.0.1(34541) idle
31444 be/4 postgres 884.00 K 4.23 M 0.00 % 3.27 % postgres: user production 192.168.0.1(34510) idle
32522 be/4 postgres 408.00 K 2.98 M 0.00 % 3.27 % postgres: user production 192.168.0.5(38361) idle
32762 be/4 postgres 1156.00 K 5.28 M 0.00 % 3.20 % postgres: user production 192.168.0.3(57962) idle
32582 be/4 postgres 1084.00 K 3.38 M 0.00 % 2.86 % postgres: user production 192.168.0.5(43104) idle
31353 be/4 postgres 2.04 M 3.02 M 0.00 % 2.82 % postgres: user production 192.168.0.1(34444) idle
31441 be/4 postgres 700.00 K 2.68 M 0.00 % 2.64 % postgres: user production 192.168.0.1(34465) idle
31462 be/4 postgres 980.00 K 3.50 M 0.00 % 2.57 % postgres: user production 192.168.0.1(34547) idle
32709 be/4 postgres 428.00 K 3.23 M 0.00 % 2.56 % postgres: user production 192.168.0.5(34323) idle
685 be/4 postgres 748.00 K 3.59 M 0.00 % 2.41 % postgres: user production 192.168.0.3(34911) idle
683 be/4 postgres 728.00 K 3.19 M 0.00 % 2.38 % postgres: user production 192.168.0.3(34868) idle
32765 be/4 postgres 464.00 K 3.76 M 0.00 % 2.21 % postgres: user production 192.168.0.3(58074) idle
32760 be/4 postgres 808.00 K 6.18 M 0.00 % 2.16 % postgres: user production 192.168.0.3(57958) idle
1912 be/4 postgres 372.00 K 3.03 M 0.00 % 2.16 % postgres: user production 192.168.0.5(33743) idle
31446 be/4 postgres 1004.00 K 2.09 M 0.00 % 2.16 % postgres: user production 192.168.0.1(34539) idle
31460 be/4 postgres 584.00 K 2.74 M 0.00 % 2.10 % postgres: user production 192.168.0.1(34545) idle

5) vmstat 1
procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
r b swpd free buff cache si so bi bo in cs us sy id wa
1 1 573424 321080 27124 28504352 0 0 143 618 0 4 2 0 91 7
0 1 573424 320764 27124 28504496 0 0 104 15654 3788 4961 1 0 85 14
0 1 573424 320684 27124 28504616 0 0 276 12736 4099 5374 0 1 84 15
0 1 573424 319672 27124 28504900 0 0 80 7746 3624 4949 2 0 82 16
0 1 573424 319180 27124 28504972 0 0 36 12489 3653 4761 2 0 86 12
0 1 573424 318184 27132 28505000 0 0 8 10482 3413 4898 0 0 87 13
0 1 573424 318424 27132 28505016 0 0 0 9564 2746 4290 0 0 87 13
0 1 573424 318308 27168 28505016 0 0 36 10562 1895 2149 0 0 87 12
0 3 573424 318208 27168 28505020 0 0 84 18529 3035 3265 1 0 85 14
0 1 573424 318732 27176 28505080 0 0 84 14574 2986 3231 0 0 84 16
0 2 573424 317588 27176 28505184 0 0 4 6681 1991 2207 2 1 86 12
0 1 573424 316852 27176 28505260 0 0 76 7670 2910 3996 2 1 85 13
0 1 573424 316632 27184 28505256 0 0 0 7186 2661 3740 0 0 87 12
0 1 573424 316720 27188 28505260 0 0 0 2590 1731 2474 0 0 88 12
0 1 573424 314252 27192 28505696 0 0 460 11612 1757 2431 0 0 82 18
0 2 573424 313504 27192 28505724 0 0 0 19656 1775 2099 0 0 83 17
0 3 573424 313300 27196 28505780 0 0 188 6237 2746 3193 2 0 80 17
0 2 573424 312736 27200 28506348 0 0 804 18466 5014 6430 2 1 75 23
2 35 573424 307564 27200 28509920 0 0 3912 16280 14377 15470 14 3 28 56
0 5 573424 282848 27208 28533964 0 0 7484 27580 22017 25938 17 3 17 63
1 5 573424 221100 27208 28563360 0 0 2852 3120 19639 28664 12 5 52 31
0 4 573428 229912 26704 28519184 0 4 1208 5890 13976 20851 13 3 56 28
0 2 573448 234680 26672 28513632 0 20 0 17204 1694 2636 0 0 71 28
3 7 573452 220836 26644 28525548 0 4 1540 36370 27928 36551 17 5 50 27
1 3 573488 234380 26556 28517416 0 36 584 19066 8275 9467 3 2 60 36
0 1 573488 234496 26556 28517852 0 0 56 47429 3290 4310 0 0 79 20

6) sudo lsof - a hell of a lot of output, I can post it if anyone is interested :-)

#### Notes and thoughts ##############################################################################

As you can see, even though I have moved the pg_xlog folder to the SSD array (md3) the by far largest amount of writes still goes to the regular HDD's (md2), which puzzles me - what can that be?
From stat 3) (the iostat) I notice that the SSD's doesn't seem to be something near fully utilized - maybe something else than just pg_xlog could be moved her?
I have no idea if the amount of reads/writes is within the acceptable/capable for my kind of hardware, or if it is far beyond?
In stat 3) (the iotop) it says that the RAID array (md2) is the most "waiting" part, does that taste like a root cause, or more like a symptom of some other bottleneck?

Thanks, for taking the time to look at by data! :-)

Evgeny Shishkin

2012-12-10 23:00:51 UTC