Wednesday, October 31, 2012

Ora-26040 block corruption

I never saw this one before today. And I really wondered for a while what it could be. It was an Oracle error ora-1578 wrapped around an ora-01110, wrapped around an ora-26040 wrapped around an ora-06512 etc. etc. It was a LOOOONG statement. But it was the error 26040 that was new to me, and that I saw as the point of attack.



And the funny part was that this error didnt even show up during backup runs or similar. I just got alerted to some shortdumps in the SAP BW system....


So my first step was ofcourse to check how many block were corrupted, and if they were located in temporary, index or other easily recreateable blocks.
I quickly logged on to the system as Oracle administrator, and logged on to the RMAN interface and ran this querry: backup validate check logical database;

Much to my dismay, upon looking over the results (in SQLPLUS: select * from v$database_block_corruption;) I found more than 23000 corrupt blocks, even the mere thought of restoring this many blocks were disheartening to me. And this many corrputions means that there's something causing the errors, something that I needed to fix before getting rid of the corruption. So I decided to investigate the cause of the corruption.

What I found was slightly surprising. When I went through my list, I noticed that all of the corruptions were of the type "nologging". So I decided this might be the clue I needed and started digging. In the end it turned out that a BW system is loaded with a lot of reporting data from various sources. And someone had started a load with the option "nologging", which means that the undo segments will not be enabled. This is great if you're loading data larger than what can be put in the undo area, but for daily operations, not so much. As it happens, once a load done in nologging mode gets cancelled, there's no option to do a rollback, so the block written get marked as corrupt, if the transaction cancels (which one of them did)

That also explained why I hadn't seen this error before, noone in their right mind should load business critical data without the option to recover :)

It all turned out well, as the developers got the information and started doing the loads correctly, and I just restored the database...

No comments:

Post a Comment