Discussion:
PostgreSQL 9.2.3 performance problem caused Exclusive locks
(too old to reply)
Emre Hasegeli
2013-03-08 11:27:16 UTC
Permalink
Hi,

I upgraded our master database server from 9.2.2 to 9.2.3 on Monday. We
have been experiencing performance problems since then. Yesterday, our
application hit the connection limit 5 times. It causes approximately
15 seconds of downtime. The database server hit 50 load average, then
everything came back to normal.

We have a very good database server dedicated to PostgreSQL. It has 64
cores and 200 GiB of RAM which is 2 times bigger than our database.
We run PostgreSQL on RHEL relase 6.2. The database executes 2k transactions
per second in busy hours. The server is running 1 - 2 load average
normally.

PostgreSQL writes several following logs during the problem which I never
saw before 9.2.3:

LOG: process 4793 acquired ExclusiveLock on extension of relation 305605
of database 16396 after 2348.675 ms

The relation 305605 was the biggest table of the database. Our application
stores web service logs as XML's on that table. It is only used to insert
new rows. One row is approximately 2 MB and 50 rows inserted per second
at most busy times. We saw autovacuum processes during the problem. We
disabled autovacuum for that table but is does not help. I tried to archive
the table. Create a new empty one, but it does not help, too.

We also have an unlogged table to used by our application for locking.
It is autovacuumed every 5 minutes as new rows are inserted and deleted
continuously.

Most of our configuration parameters remain default except the following:

max_connections = 200
shared_buffers = 64GB
max_prepared_transactions = 0
work_mem = 64MB
maintenance_work_mem = 512MB
shared_preload_libraries = '$libdir/pg_stat_statements'
wal_level = hot_standby
checkpoint_segments = 40
effective_cache_size = 128GB
track_activity_query_size = 8192
autovacuum = on
autovacuum_max_workers = 10

I will try to reduce autovacuum_max_workers and increase max_connections
to avoid downtime. Do you have any other suggestions? Do you know what
might
have caused this problem? Do you think downgrading to 9.2.2 is a good idea?
--
Sent via pgsql-performance mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
Emre Hasegeli
2013-03-08 19:56:23 UTC
Permalink
*
*
*Fix performance problems with autovacuum truncation in busy workloads
(Jan
Wieck)*
*Truncation of empty pages at the end of a table requires exclusive lock,
but autovacuum was coded to fail (and release the table lock) when there
are conflicting lock requests. Under load, it is easily possible that
truncation would never occur, resulting in table bloat. Fix by
performing a
partial truncation, releasing the lock, then attempting to re-acquire the
lock and
continue<http://www.postgresql.org/docs/9.2/static/release-9-2-3.html#>.
This fix also greatly reduces the average time before autovacuum releases
the lock after a conflicting request arrives.*
This could be a reason of your locking.
Yes, I saw this. It is commit b19e4250b45e91c9cbdd18d35ea6391ab5961c8d by
Jan Wieck. He also seems worried in the commit message about this patch. Do
you think this is the exact reason of the problem?

I have downgraded to 9.2.2, decreased the autovacuum_max_workers to 2 from
10 and increase max_connections to 500 from 200 in the mean time. There are
not any ExclusiveLock's since then.
--
Sent via pgsql-performance mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
Emre Hasegeli
2013-03-12 19:17:33 UTC
Permalink
Post by Emre Hasegeli
PostgreSQL writes several following logs during the problem which I never
LOG: process 4793 acquired ExclusiveLock on extension of relation
305605 of database 16396 after 2348.675 ms
I tried

* to downgrade to 9.2.2
* to disable autovacuum
* to disable synchronous commit
* to write less on the big tables
* to increase checkpoint segments
* to increase max connections
* to move pg_xlog to sepe

None of them helps to avoid downtimes. I could not find anything related
to it? Do you have any idea? Have you ever experience something like this?
--
Sent via pgsql-performance mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
Joshua Berkus
2013-03-13 21:04:58 UTC
Permalink
Emre,
Post by Emre Hasegeli
LOG: process 4793 acquired ExclusiveLock on extension of relation
305605 of database 16396 after 2348.675 ms
The reason you're seeing that message is that you have log_lock_waits turned on.

That message says that some process waited for 2.3 seconds to get a lock for expanding the size of relation 16396/305605, which is most likely an index. This is most likely due to changes in your application, or an increase in concurrent write activity.

--Josh
--
Sent via pgsql-performance mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
Jeff Janes
2013-03-14 04:53:55 UTC
Permalink
On Friday, March 8, 2013, Emre Hasegeli wrote:

PostgreSQL writes several following logs during the problem which I never
Post by Emre Hasegeli
LOG: process 4793 acquired ExclusiveLock on extension of relation 305605
of database 16396 after 2348.675 ms
The key here is not that it is an ExclusiveLock, but rather than it is the
relation extension lock. I don't think the extension lock is ever held
across user-code, or transaction boundaries, or anything like that. It is
held over some small IOs. So if it blocked on that for over 2 seconds, you
almost surely have some serious IO congestion.

And this particular message is probably more a symptom of that congestion
than anything else.

You said you rolled back to 9.2.2 and the stalling is still there. Are you
still seeing the log message, or are you now seeing silently stalls? Did
you roll back all other changes that were made at the same time as the
upgrade to 9.2.3 (kernel versions, filesystem changes/versions, etc.)?

Cheers,

Jeff
Emre Hasegeli
2013-03-14 17:07:28 UTC
Permalink
Post by Emre Hasegeli
PostgreSQL writes several following logs during the problem which I never
Post by Emre Hasegeli
LOG: process 4793 acquired ExclusiveLock on extension of relation 305605
of database 16396 after 2348.675 ms
The key here is not that it is an ExclusiveLock, but rather than it is the
relation extension lock. I don't think the extension lock is ever held
across user-code, or transaction boundaries, or anything like that. It is
held over some small IOs. So if it blocked on that for over 2 seconds, you
almost surely have some serious IO congestion.
And this particular message is probably more a symptom of that congestion
than anything else.
You said you rolled back to 9.2.2 and the stalling is still there. Are you
still seeing the log message, or are you now seeing silently stalls? Did
you roll back all other changes that were made at the same time as the
upgrade to 9.2.3 (kernel versions, filesystem changes/versions, etc.)?
I did not try with different kernel or file system. It was not because of
9.2.3, same problem occurred in both 9.2.2 and 9.2.3. Increasing max
connections make it worse. It lasts almost 15 minutes in the last time.

There were not much disk utilization while it is happening, "top" was
pointing out most of the CPU usage on the %sy column, there were no IO
wait.
I saw "allocstalls" increasing on "atop". There were a lot of slow insert
statements in the logs except ExclusiveLock waits.

We were using 64 GiB of shared buffers. RhodiumToad suggested to reduce it
on the IRC channel. It did not happen since then.

It was a real problem for us. I could not find anything related to it. I
cannot let it happen again on the production environment but I would be
happy to share more experience, if it would help you fix it.
--
Sent via pgsql-performance mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
Loading...