This paper gives an excellent detailed description of the online reorganization utility used in IBM's DB DBMS. The data structures and algorithms are exposed with great clarity. The paper is well worth reading for the insight it affords on industrial strength software.

The motivation for the reorganization discussed in this paper is the restoration of clustering. In DB2, the clustering index is a B-tree with keys and RIDs (Record IDentifiers) in the leaves. RIDs consist of a page number and a slot number in the page. As records are inserted or deleted, as long as there is space in the page, the records remain clustered. RIDs are never changed except during reorganization. If (variable-length) records grow too big to fit the page, the data is moved to another page and the existing record in the original page becomes a pointer record which contains the RID of the overflowed data record. The pointer adds an extra disk access to record retrieval. Since there is not always room for new records and since growing records cause performance degradation due to pointers, it is periodically advisable to reorganize to regain clustering and remove forwarding pointers.

The basic method, very nicely explained in this paper, is to copy the data, sort the copy, bring the copy up to date by applying log records and finally switch to the new copy. It is the abundance of detail that makes the paper interesting reading. One complication that arises in this algorithm is the difficulty in finding the data records in the new copy when you want to apply log records. The log records only have the old RID in them. The reorganization process maintains a temporary table called a mapping table that maps between the old and new RIDs. The bulk of this paper explains the interaction between the mapping table and the processing of the log records. First, the log is scanned to translate the RIDs. A set of "prefixes" is made with the old RID, the new RID from the table (or an estimated new RID in the case of insertion) and the log sequence number of the log record. Then the prefixes are sorted by new RID, so the updates can be made in order of the new RID and so that all the updates for one given record are consecutive in the prefix list. Finally, the actual insertions, updates and deletes are applied to the new copy of the table. Details of each sub-case (insert, delete, or update and pointer or data record) are included. One lesson here is that not having unique primary keys for data records and not having these primary keys in the log record makes reorganization more difficult. Some other systems which use the same basic method have primary keys in log records (e.g. Tandem [2], Clustra [3]).

a service of  Schloss Dagstuhl - Leibniz Center for Informatics