Discussion:
Risk of data corruption/loss?
(too old to reply)
Niels Kristian Schjødt
2013-03-13 15:24:03 UTC
Permalink
I'm considering the following setup:

- Master server with battery back raid controller with 4 SAS disks in a RAID 0 - so NO mirroring here, due to max performance requirements.
- Slave server setup with streaming replication on 4 HDD's in RAID 10. The setup will be done with synchronous_commit=off and synchronous_standby_names = ''

So as you might have noticed, clearly there is a risk of data loss, which is acceptable, since our data is not very crucial. However, I have quite a hard time figuring out, if there is a risk of total data corruption across both server in this setup? E.g. something goes wrong on the master and the wal files gets corrupt. Will the slave then apply the wal files INCLUDING the corruption (e.g. an unfinished transaction etc.), or will it automatically stop restoring at the point just BEFORE the corruption, so my only loss is data AFTER the corruption?

Hope my question is clear
--
Sent via pgsql-performance mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
Jeff Janes
2013-03-13 17:13:28 UTC
Permalink
On Wed, Mar 13, 2013 at 8:24 AM, Niels Kristian Schjødt <
Post by Niels Kristian Schjødt
- Master server with battery back raid controller with 4 SAS disks in a
RAID 0 - so NO mirroring here, due to max performance requirements.
- Slave server setup with streaming replication on 4 HDD's in RAID 10. The
setup will be done with synchronous_commit=off and
synchronous_standby_names = ''
Out of curiosity, in the presence of BB controller, is
synchronous_commit=off getting you additional performance?
Post by Niels Kristian Schjødt
So as you might have noticed, clearly there is a risk of data loss, which
is acceptable, since our data is not very crucial. However, I have quite a
hard time figuring out, if there is a risk of total data corruption across
both server in this setup? E.g. something goes wrong on the master and the
wal files gets corrupt. Will the slave then apply the wal files INCLUDING
the corruption (e.g. an unfinished transaction etc.), or will it
automatically stop restoring at the point just BEFORE the corruption, so my
only loss is data AFTER the corruption?
It depends on where the corruption happens. WAL is checksummed, so the
slave will detect a mismatch and stop applying records. However, if the
corruption happens in RAM before the checksum is taken, the checksum will
match and it will attempt to apply the records.

Cheers,

Jeff
Niels Kristian Schjødt
2013-03-13 17:34:19 UTC
Permalink
Post by Niels Kristian Schjødt
- Master server with battery back raid controller with 4 SAS disks in a RAID 0 - so NO mirroring here, due to max performance requirements.
- Slave server setup with streaming replication on 4 HDD's in RAID 10. The setup will be done with synchronous_commit=off and synchronous_standby_names = ''
Out of curiosity, in the presence of BB controller, is synchronous_commit=off getting you additional performance?
Time will show :-)
Post by Niels Kristian Schjødt
So as you might have noticed, clearly there is a risk of data loss, which is acceptable, since our data is not very crucial. However, I have quite a hard time figuring out, if there is a risk of total data corruption across both server in this setup? E.g. something goes wrong on the master and the wal files gets corrupt. Will the slave then apply the wal files INCLUDING the corruption (e.g. an unfinished transaction etc.), or will it automatically stop restoring at the point just BEFORE the corruption, so my only loss is data AFTER the corruption?
It depends on where the corruption happens. WAL is checksummed, so the slave will detect a mismatch and stop applying records. However, if the corruption happens in RAM before the checksum is taken, the checksum will match and it will attempt to apply the records.
Cheers,
Jeff
Joshua Berkus
2013-03-13 21:18:38 UTC
Permalink
Neils,
Post by Niels Kristian Schjødt
- Master server with battery back raid controller with 4 SAS disks in
a RAID 0 - so NO mirroring here, due to max performance
requirements.
- Slave server setup with streaming replication on 4 HDD's in RAID
10. The setup will be done with synchronous_commit=off and
synchronous_standby_names = ''
I'd be concerned that, assuming you're making the master high-risk for performance reasons, that the standby would not keep up.
Post by Niels Kristian Schjødt
So as you might have noticed, clearly there is a risk of data loss,
which is acceptable, since our data is not very crucial. However, I
have quite a hard time figuring out, if there is a risk of total
data corruption across both server in this setup? E.g. something
goes wrong on the master and the wal files gets corrupt. Will the
slave then apply the wal files INCLUDING the corruption (e.g. an
unfinished transaction etc.), or will it automatically stop
restoring at the point just BEFORE the corruption, so my only loss
is data AFTER the corruption?
Well, in general RAID 1 really just protects you from HDD failure, not more subtle types of corruption which occur onboard an HDD. So from that respect, you haven't increased your chances of data corruption at all; if the master loses a disk, it should just stop operating; a simple check that all WALs are 16MB on the standby would do the rest. I'd be more concerned that you're likely to be yanking and completely rebuilding the master server every 4 or 5 months.
--
Sent via pgsql-performance mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
Loading...