Unsolved
This post is more than 5 years old
28 Posts
0
5243
November 3rd, 2011 09:00
To Multiplex Redo Logs or Not To Multiplex Redo Logs
That is the question, to which I would love to hear what other people are doing. I am a believer that redo logs no longer need to be multiplexed. The instance of a corrupt write to redo logs is increasingly rare. I my experience I have seen or heard of only 1 in the last 5 years of my career working with hundreds of databases, many of which are highly transactional.
My quest to remove multiplexing started from these high transactional databases that always had heavy waits on redo writes. This is especially true while doing remote synchronous replication to a DR site. Many people argue that it is still possible and the performance hit to highly transactional databases is necessary to protect the database, just in case.
I am very interested to get other perspectives/arguments for or against.
jweinshe
40 Posts
1
November 7th, 2011 09:00
I'll take the always safe answer of "It depends". I do run multiplexed redo logs and I see no reason not to if you're not running into the performance bottleneck you're seeing with high transactional databases.
In the event you are running high transactional databases and are experiencing heavy waits on redo writes, to me you can do a few things to mitigate the performance impact. There's no reason these multiplexed redo logs need to be on the same disks/LUN and given that part of the reason you're doing this is to avoid losing redo logs due to a disk failure or corruption during write, it would be in your best interest to always write the members of the multiplexed redo logs to different sets of physical disks.
As far as DR - again, sadly, it depends. Depending on the architecture and the DR method (recoverpoint, Oracle data guard, etc) it may be possible to write only one member to the DR site (for example, with recoverpoint and choosing what LUNs to replicate) or to compress the redo log traffic (11g Oracle redo transport compression) at the database level or even at the network level (Cisco WAAS devices and Silverpeak can both compress Celerra traffic). What you already have in place and can leverage might make sense then buying licenses / hardware for other solutions and at that point you're talking a cost benefit solution for the business, not a technical solution.
Finally, on the actual mechanics of the heavy waits on redo writes, you might want to check out this post by Kevin Closson who now works in the Greenplum area of EMC - http://kevinclosson.wordpress.com/2007/07/21/manly-men-only-use-solid-state-disk-for-redo-logging-lgwr-io-is-simple-but-not-lgwr-processing/ . Honestly, it's a bit more technical than I can summarize.
BartS
46 Posts
1
November 9th, 2011 02:00
Hi Jay,
I had a few good discussions with some of my customers on this. A lot of myths around the level of extra protection exist.
My personal opinion, and backed by challenging discussions with my peers: If you run on EMC (and enjoy the excellent RAID protection, cache/disk scrubbing, power destage, disk checksumming, hot spares etc) then there is no real technical reason to mirror redo logs. Maybe only political/support reasons.
More on my blog: http://bartsjerps.wordpress.com/2011/03/23/duplexing-redo-logs/
I'd like to hear your thoughts :-)
jweinshe
40 Posts
0
November 9th, 2011 06:00
I'll check out your blog - thanks!
Honestly, until Darryl brought it up, I never really had thought about it - but then I don't see systems pushing as many transactions as Darryl is running
Admittedly this is an EMC forum, but Darryl never said it had to be running on EMC hardware.
On this same area (multiplexing of files) do you feel the idea of multiple control files is also not needed technically?
DarrylBSmith
28 Posts
0
November 9th, 2011 06:00
Multiplexing of control files, in my opinion is essential. If someone were to accidentally delete one, your database is now just a bunch of bytes. As long as you have a control file and a backup, your database is recoverable. Not so worried about corruption as accidental loss, also contention tends to be much more manageable.
BartS
46 Posts
0
November 9th, 2011 08:00
Daryll,
I see your point... but I cannot imagine how someone could *accidentally* delete just a redo logfile, and not other datafiles...
Furthermore, if you use Oracle ASM then you can't even do this with a regular Unix command.
Someone foolish enough to manage to get the redo log destroyed will most likely cause much more damage. For those situations we have snaps/clones and continuous data protection.
Besides, if you duplicate the redo logs, then the same fool would probably delete both in one go. Then the database is still dead beef.
I discussed exactly the same points with my peers at customers. Totally understand the urge to try to make things monkey proof. However, the chance of being saved by redo mirroring after someone does something stupid - without causing much more damage - is, IMHO, very minimal...
Better solution: separate ownership of datafiles/redo logs and the ids of DBA/Unix admins. So that DBA's cannot directly access datafiles (including redo). Implement SUDO for tasks that would require root or "dba" access.
DarrylBSmith
28 Posts
0
November 9th, 2011 12:00
The question was about multiplex of the control files. It is more about recoverability. We had a unix admin wipe out the diskgroup that my 2 control files were in. Fortunately this was in my lab, but I was not able to restore the database. I learned and now my control files are in two different diskgroups.
BartS
46 Posts
0
November 10th, 2011 01:00
Ah, control files... missed that
thought we were discussing redo logs...
IMO duplexing control files does not hurt as they do not generate massive I/O overhead anyway and they are fairly small. So the frequently updated blocks of those will stay in cache anyway.
Putting them in different file systems or disk groups as Darryl Suggests sounds like a good idea to me.
mkaberle
27 Posts
0
November 10th, 2011 10:00
Great question.
I got involved in this discussion at a customer site a few months back and everyone bantered back n forth on why redo logs should be multiplexed, but nobody came to a silver bullet reason on why the redo logs should be multiplexed. And it was a “lively” discussion (if you know what I mean) between the DBA, Sys Admin & Storage Admin folks.
Plus, I called other neutral DBA friends of mine for a sanity check on this question and they came up with the same answers as we did at the customer site. Plus one of them said “well that is what Oracle recommends”.
Nevertheless, after the meeting & up til today I suspect they continued to do things as they always have, which is to multiplex the R1 redo logs. But the Storage Admin has some new add’l info to provide back to his managers manager on disk space consumption.
Now, why did this question come up at the customer site?
Well, I was there talking about ASM/VP/TF/SRDF/DB storage layout thoughts and the lead storage admin asked me if it is made sense to multiplex RAID1 redo logs?
I replied with a few questions… “do you need 4 copies of the redo logs on disk?” “have you ever had corrupted redo logs?” “do you have redo log perf issues?” “why do you ask this question?”.
The answers were:
jeff_browning
256 Posts
0
November 11th, 2011 06:00
Darryl:
Interesting question. I just ran into a customer who created a performance issue with multiplexed redo logs. Basically, this customer had create a single RAID group, then created a single LUN on this RAID group, and then, you guessed it, placed both sides of the multiplex on this LUN. (The LUN was configured as a single-LUN ASM diskgroup.)
The result was that the database showed log file sync in the hundreds of milliseconds and this was the number 1 wait event, constituting about 40% of database time.
The solution, of course, was to configure a second RAID group, and put the other half of the multiplex there. Which begs the question that you raise.
The standard answer I get to this question is the one proposed earlier: It allows a stupid DBA to delete an entire directory (or the contents of a directory) without destroying data. Since redo logs contain the only copy of transactional data until the logs are archived, or until a checkpoint forces the data to the datafiles, therefore they are a potential single-point-of-failure and the most obvious exposure is the stupid DBA risk.
Of course, I don't generally tend to regard myself as a stupid DBA. (I have my off days but still....)
My point is that I don't usually attempt to protect myself from doing stupid things. Absent a very messy undisciplined environment (which would be a problem in itself), it strikes me that multiplexing redo logs is a big cost for minimal gain. Simply don't do silly things like deleting entire directories!
Perhaps EMC should weigh in on this one. Up till now, all of our Proven Solutions for Oracle have had multiplexed redo logs (with one exception, so far as I know). Given that we provide world-class RAID protection, and typically redo logs end up on RAID 10, it seems that the risk of loss from disk failure is pretty well mitigated. It would be an interesting project to calculate the exact performance overhead that occurs from multiplexing redo logs, even when correctly configured on disk.
Regards,
Jeff
SKT2
2 Intern
•
1.3K Posts
0
November 17th, 2011 03:00
all/most of the features Bart described are available in any enterprise arrays(i did not understand the power destage though here?) ; His point was mostly focussed on the robustness of the storage these days