Thursday, December 6, 2012

Reasons for reorganizing an Oracle database

I've been the witness of many discussions about what benefit reorganizing a database has.
First in the days of old, when doing so would cost more more work then what was saved
Second when someone started using indexes and said "now we never need to reorganize our data again"
Again when someone said that reorganizing is just "reclaiming diskspace, and that's pretty cheap now"
There there was the "LMTS will reuse all the diskspace easily"
I've even heard someone use the excuse that: "The data is on a SAN, it will reorganize the data internally", and the latest one: "I'm using data compression, so there will be no wasted space"
I'm pretty sure with the move on SAP HANA, I'll soon hear the argument: "its in-memory computing"

But the fact of the matter is that for all databases, indexed or not,  if there's a lot of changes in the data, it will get jumbled around. So there'll always be a need (how big will depend on the business) to reorganize the data IF one of the following conditions are true:

1) Reporting reads data sequentially.
2) Your index consists mainly of inserts in one side and thus becomes uneven.
3) You need to reclaim diskspace after deleting data
4) Select statements issued can make use of your data being sorted
5) You need to move data around

And there are other conditions too, but these are the ones I most commonly run into.

The reason for (1) is simply that once data has ben entered into an Oracle database, a highwater mark is set. And a sequential read will even borther to read the empty blocks (I have yet to figure out why this behavior is, but I'm sure there's a good reason to be found somewhere). Thus if you remove and insert a lot of data (like say during a BW load) into your database, you may end up with a lot of empty blocks that sometimes get read, when they dont have to. This will cause performance degredation. Arguably, you can also reclaim wasted space, but if you DO have all these changes continually, you'll likely not be saving a lot of space in the long run. Also, it's worth noting that HOW your inserts are made makes a difference as to weather or not the highwatermark is being pushed.
As for (2), one can make the argument that you really only need to reorganize your index. But in reality, an index is just a table too. And an index that is very heavily loaded with data on only one side (say financial postings by date, will likely consist mainly of newer and newer dates), will get slower and slower to use, as your index becomes deeper. Reorganizing the index will save you absolutely no space, but will likely give you a small performance boost. One could also make an argument that data sorted by date can benefit from partitioning. And this is oftentimes true, but that's a matter for another post.
(3) is pretty selfexplanatory. Oftentimes large dataloads, exports or other is run against a database just once. And if the diskspace is needed for other use, there's no reason why we as administrators shouldn't remove the excess empty space, and reclaiming space from the filesystem requires that the datafiles become smaller, for them to be smaller, we need to shrink (and move) the tables, for that to happen, we need to reorganize. This is the case even if you're using datacompression on your data or indices. Because it is passive cmompression. Once data gets deleted or moved, you will still have a hole in your block.
If at all possible, when someone comes to me in advance and tells me of their temporary need, I create a new tablespace on a new LUN, if this is possible, so that I can easily reclaim the space later, and even drop the LUN back into the diskpool.
For performance reasons, it's sometimes a good idea to reorg your data sorted, which leads to (4). Logically, if your select statement makes good use of data being sorted (say with a "between" statement), it would help if your data was sorted by whatever makes sense. The downside to reorganizing this way is that we as administrators needs to know a lot about the application and what querries are being run against it. Because what's good for the goose, might not be good for the gander. Sorting data one way, without looking at how it was previously sorted, might even cause a performance degradation.
Moving data (5) can frequently be done by setting a datafile offline, moving it, and bringing it online again in a new place. But this requires downtime (at the very least on the filelevel). With reorganization tools like Quest livereorg, or using a reasonably new version of Oracle, there's a redefinition package that will allow us as administrators to recreate a table in a different tablespace, easily. Moving individual datafiles (online) is still a problem, but even as painfully slow as it is, you can actually move the data pieces (tables) inside a datafile one at a time. If you need to. Once someone creates a product that can move ONLY the pieces inside a particular datafile, I'll be more than happy to endorse or even use their product. Meanwhile, for me, it's still wishfull thinking whenever I have to (online) move data to a new storage area or similar.

On top of all of this there's the whole issue of read vs write performance. What's more important ? There's the entire cost angle. If I spend 10 hours making the system save y number of CPU cycles/day, or x amount of I/O operations/day, is that profitable ? After a couple of months the users will get used to slow responsetimes anyway....

So in conclusion, yes, I can find many good arguments for reorganizing a database, but none of them are "because someone said it was a good idea". Reorganization is a reasonably big thing to do, and it will impact performance. But if used correctly, it can save your users a lot of time and griefs. Because as a strategic tool, reorganization can increase the performance of your system.

And now I'm off to report on the last reorganization effort :D