Discussion:
What setup would you choose for postgresql 9.2 installation?
(too old to reply)
Niels Kristian Schjødt
2013-03-04 13:52:01 UTC
Permalink
LSI MegaRAID SAS 9260-4i with four Intel SSDSC2CW240A3K5 SSDs OR four Hitachi Ultrastar 15K600 SAS drives?

My app is pretty write heavy and I have a lot of concurrent connections 300 - (though considering adding pgpool2 in front to increase throughput).

Regards Niels Kristian
--
Sent via pgsql-performance mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
AJ Weber
2013-03-04 14:04:48 UTC
Permalink
Apologies for the tangential question, but how would pgpool2 "increase
throughput"? Wouldn't the same number of statements be issued by your
application? It would likely reduce the number of concurrent
connections, but that doesn't necessarily equate to "increased throughput".

-AJ
Post by Niels Kristian Schjødt
LSI MegaRAID SAS 9260-4i with four Intel SSDSC2CW240A3K5 SSDs OR four Hitachi Ultrastar 15K600 SAS drives?
My app is pretty write heavy and I have a lot of concurrent connections 300 - (though considering adding pgpool2 in front to increase throughput).
Regards Niels Kristian
--
Sent via pgsql-performance mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
Scott Marlowe
2013-03-04 14:36:53 UTC
Permalink
Post by AJ Weber
Apologies for the tangential question, but how would pgpool2 "increase
throughput"? Wouldn't the same number of statements be issued by your
application? It would likely reduce the number of concurrent connections,
but that doesn't necessarily equate to "increased throughput".
This is a pretty common subject. Most servers have a "peak
throughput" that occurs at some fixed number of connections. for
instance a common throughput graph of pgbench on a server might look
like this:

conns : tps
1 : 200
2 : 250
4 : 400
8 : 750
12 : 1200
16 : 2000
24 : 2200
28 : 2100
32 : 2000
40 : 1800
64 : 1200
80 : 800
100 : 400

So by concentrating your connections to be ~24 you would get maximum
throughput. Such a graph is typical for most db servers, just a
different "sweet spot" where the max throughput for a given number of
connections. Some servers fall off fast past this number, some just
slowly drop off.
--
Sent via pgsql-performance mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
AJ Weber
2013-03-04 14:43:51 UTC
Permalink
Great info, I really appreciate the insight. Is there a FAQ/recommended
setup for running pgbench to determine where this might be? (Is there a
reason to setup pgbench differently based on the server's cores/memory/etc?)

Sorry if this detracts from the OP's original question.

-AJ
Post by Scott Marlowe
Post by AJ Weber
Apologies for the tangential question, but how would pgpool2 "increase
throughput"? Wouldn't the same number of statements be issued by your
application? It would likely reduce the number of concurrent connections,
but that doesn't necessarily equate to "increased throughput".
This is a pretty common subject. Most servers have a "peak
throughput" that occurs at some fixed number of connections. for
instance a common throughput graph of pgbench on a server might look
conns : tps
1 : 200
2 : 250
4 : 400
8 : 750
12 : 1200
16 : 2000
24 : 2200
28 : 2100
32 : 2000
40 : 1800
64 : 1200
80 : 800
100 : 400
So by concentrating your connections to be ~24 you would get maximum
throughput. Such a graph is typical for most db servers, just a
different "sweet spot" where the max throughput for a given number of
connections. Some servers fall off fast past this number, some just
slowly drop off.
--
Sent via pgsql-performance mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
Scott Marlowe
2013-03-04 15:23:44 UTC
Permalink
Post by AJ Weber
Great info, I really appreciate the insight. Is there a FAQ/recommended
setup for running pgbench to determine where this might be? (Is there a
reason to setup pgbench differently based on the server's cores/memory/etc?)
Well keep in mind that pgbench may or may not represent your real
load. However it can be used with custom sql scripts to run a
different load than that which it runs by default so you can get some
idea of where your peak connections / throughput sits. And let's face
it that if you're currently running 500 connections and your peak
occurs at 64 then a pooler is gonna make a difference whether you set
it to 64 or 100 or 48 etc.

The basic starting point on pgbench is to use a scale factor of at
least 2x however many connections you'll be testing. You can also do
read only transactions to get an idea of what the peak number of
connections are for read only versus read/write transactions. If read
only transactions peak at say 100 while r/w peak at 24, and your app
is 95% read, then you're probably pretty safe setting a pooler to ~100
conns instead of the lower 24. If your app is set to have one pool
for read only stuff (say reporting) and another for r/w then you can
setup two different poolers but that's a bit of added complexity you
may not really need.

The real danger with lots of connections comes from having lots and
lots of idle connections. Let's say you've got 1000 connections and
950 are idle. Then the server gets a load spike and queries start
piling up. Suddenly instead of ~50 active connections that number
starts to climb to 100, 200, 300 etc. Given the slower throughput most
servers see as the number of active connections climbs your server may
slow to a crawl and never recover til you remove load.
--
Sent via pgsql-performance mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
Niels Kristian Schjødt
2013-03-04 17:25:10 UTC
Permalink
That's around the behavior I'm seeing - I'll be testing tonight! :-)
Post by Scott Marlowe
however
Richard Neill
2013-03-04 14:18:00 UTC
Permalink
Post by Niels Kristian Schjødt
LSI MegaRAID SAS 9260-4i with four Intel SSDSC2CW240A3K5 SSDs OR four Hitachi Ultrastar 15K600 SAS drives?
My app is pretty write heavy and I have a lot of concurrent connections 300 - (though considering adding pgpool2 in front to increase throughput).
If you can afford it, there's no question in my mind that SSDs are the
way to go. They can be 1000 times faster for random reads.

May I suggest that you do some experiments though - perhaps with just
one disk of each type - you can get some pretty good illustrative tests
with ordinary SATA drives in an ordinary laptop/desktop (but not a USB
adapter). I did this originally when evaluating the (then new) Intel X25
SSD.

The other things to note are:

* The filesystem matters. For the important thing, fdatasync(), ext2 is
2x as fast as ext4, which itself is much faster than ext3. BUT ext2's
fsck is horrid, so we chose ext4.

* Will you enable the disk (or RAID controller) write cache?

* Have you enough RAM for your key tables (and indexes) to fit in
memory? If not, 64GB of RAM is cheap these days.

* In some applications, you can get a speed boost by turning
synchronous_commit off - this would mean that in a database crash, the
last few seconds are potentially lost, even through they application
thinks they were committed. You may find this an acceptable tradeoff.

* Postgres doesn't always write straight to the tables, but uses the WAL
(write-ahead-log). So the benefit of SSD performance for "random writes"
is less relevant than for "random reads".


Lastly, don't overdo the concurrent connections. You may end up with
less thoughput than if you let postgres devote more resources to each
request and let it finish faster.


Hope that helps,

Richard
--
Sent via pgsql-performance mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
Loading...