Wednesday, October 31, 2012

Identifying Hidden Columns in Oracle

Sometimes an Oracle system will start having some hidden columns. This is quite normal behaviour if views or dependants change or are dropped.

However, for an SAP system, this is not desirable behaviour, so I sometimes have to remove hidden columns.
The downside, though, is that first I have to find them. Fancy scripts exists for this, but I find that a single line of code and some formatting is enough.

SET PAGESIZE 1000
SET LINESIZE 200
COLUMN owner FORMAT a10
select distinct owner,table_name,hidden_column from all_tab_cols where hidden_column = 'YES' and owner != 'SYS';

This excludes the sys objects that are not really manipulated by SAP anyway. So the only thing left is to reorganize the tables listed.... I'll get that done this weekend.... Right after the gym....

Ora-26040 block corruption

I never saw this one before today. And I really wondered for a while what it could be. It was an Oracle error ora-1578 wrapped around an ora-01110, wrapped around an ora-26040 wrapped around an ora-06512 etc. etc. It was a LOOOONG statement. But it was the error 26040 that was new to me, and that I saw as the point of attack.



And the funny part was that this error didnt even show up during backup runs or similar. I just got alerted to some shortdumps in the SAP BW system....


So my first step was ofcourse to check how many block were corrupted, and if they were located in temporary, index or other easily recreateable blocks.
I quickly logged on to the system as Oracle administrator, and logged on to the RMAN interface and ran this querry: backup validate check logical database;

Much to my dismay, upon looking over the results (in SQLPLUS: select * from v$database_block_corruption;) I found more than 23000 corrupt blocks, even the mere thought of restoring this many blocks were disheartening to me. And this many corrputions means that there's something causing the errors, something that I needed to fix before getting rid of the corruption. So I decided to investigate the cause of the corruption.

What I found was slightly surprising. When I went through my list, I noticed that all of the corruptions were of the type "nologging". So I decided this might be the clue I needed and started digging. In the end it turned out that a BW system is loaded with a lot of reporting data from various sources. And someone had started a load with the option "nologging", which means that the undo segments will not be enabled. This is great if you're loading data larger than what can be put in the undo area, but for daily operations, not so much. As it happens, once a load done in nologging mode gets cancelled, there's no option to do a rollback, so the block written get marked as corrupt, if the transaction cancels (which one of them did)

That also explained why I hadn't seen this error before, noone in their right mind should load business critical data without the option to recover :)

It all turned out well, as the developers got the information and started doing the loads correctly, and I just restored the database...

Friday, October 26, 2012

Orphaned Oracle processes

For some unfathomable reason, the backup system here at work seems to stop running once in a while.
This causes the RMAN backup to stop (but not fail), which causes the underlying Oracle shadow processes to hang, because they assume they'll get to continue their work later.

The proceses takes up large amounts of memory and along with various resource locks, starts to cause a bother for the users. So we have to identify and kill those processes.

So I log on as the DB owner and run sqlplus, where I issue this command:

SELECT spid FROM v$process WHERE NOT EXISTS
( SELECT 1 FROM v$session WHERE paddr = addr);

This will display all spids in the v$process that does not have a parrent session.

Then I simply try the unix commands "kill -3" (quit), if this fails then "kill -1" (hangup) and if this fails "kill -9" (abort)


I did, however, at a later date find a nice note indicating that  as of Oracle 10.2, there is an alternative way to kill sessions that will also kill the OS process (from sqlplus):

ALTER SYSTEM DISCONNECT SESSION ’sid,serial#’ POST_TRANSACTION;
ALTER SYSTEM DISCONNECT SESSION ’sid,serial#’ IMMEDIATE;

 The POST_TRANSACTION clause waits for ongoing transactions to complete before disconnecting the
session, while the IMMEDIATE clause disconnects the session and ongoing transactions are
recovered immediately.


So I'll try the latter method next time this happens, as this would likely cause less collateral damage in the system....