Are adjacent detail entries likely to have the same key value?

| Front Page | News Headlines | Technical Headlines | Planning Features | Advanced Search |

3K Associates Sponsor Message

December 2002

Boosting Your e3000 Productivity

Are adjacent detail entries likely to have the same key value?

By Bob Green

In last month’s column I discussed the mechanics of IMAGE master datasets and how performance is impacted by the percent emptiness or fullness of the dataset. This month we explore detail datasets and one of their performance factors.

What is the likelihood that entries with the same key are located at adjacent locations in a detail dataset?

When we retrieve the entries for a customer from a sales detail dataset, we do a DBFIND by the customer number, then a DBGET mode 5 to read each detail entry for that customer. Our natural tendency is to think that since these detail entries are logically “grouped” they are probably physically grouped as well. If they are, then retrieving them should be very quick. However, if they are likely to be physically far apart on the disk hardware, then performance will be much less.

The entries we are retrieving have something in common — they have the same key value. They may even be in the same disk page. If so, and there are 20 entries per page (4K bytes divided by 20 equals about 200 bytes per entry), then we would use 1/20th of a disk read per entry. Sounds fast!

But what are the odds that the entries with the same key value are adjacent?

To understand when they will be adjacent and when not, we must understand how entries are stored in detail datasets.

Internally, all IMAGE datasets are organized into logical blocks of entries, with the number of entries per block being the blockfactor. The block organization is a hangover from the original design of IMAGE on the Classic HP 3000 computers (MPE V). On MPE/iX, IMAGE actually performs page-level I/O. A page has 4,096 bytes. This means that one or more IMAGE blocks are read or written as part of each I/O operation.

Whether dealing with blocks or pages, the goal is to minimize the number of I/Os required to access the database.

Starting from An Empty Detail Dataset

When a detail dataset is empty, IMAGE adds records in the order that they are created, starting from record location one.

Records are stored in chronological sequence. That means records for the same customer are not necessarily stored consecutively, unless they happened to be added at the same time.

Assuming that customer ID is a key, then records with the same customer ID are linked together with pointers to form a chain. The first and last records in a chain are also linked from the corresponding master record for that customer ID.

These chains by customer ID are used when you do retrievals by customer ID. IMAGE finds the matching entry in the master dataset, then follows the path chain to retrieve the detail entries.

Multiple Search Keys

Remember that only one search path per detail dataset can be optimized. If the customer ID path into sales details is optimized, then the part number path CANNOT be optimized. The physical data can only be sorted one way.

It is tempting to add a search path for every kind of search that your application requires, but this is a bad strategy: “Paths are for people.” They exist so that on-line users can retrieve and examine the entries by that path. Paths are not necessary for reports that run only once a day — you can always use Suprtool and a serial scan instead, but that is another article.

“Entropy” of Detail Datasets

As IMAGE adds more records to the dataset, the chains for any given key value get longer and span more blocks. When records are deleted, their space is added to a “delete chain” and subsequent additions are fulfilled from this list, rather than being added to the end of the dataset.

Therefore, the more active the dataset is, in terms of additions and deletions, the more likely that adjacent records on a chain will be in different blocks. Over time, records for the same customer are scattered over multiple blocks.

As a result, IMAGE has to perform a lot of extra disc IO to read a whole chain for a given customer. In the worst case scenario (but a common one for active datasets), IMAGE does one disc IO for each record in the chain.

Conclusion: it is possible for related entries to be adjacent, but only if the related entries were added at the same time and no entries have ever been deleted. For any real-world dataset where entries are added and deleted dynamically over time, it is most likely that the related entries are not adjacent. A good rule of thumb is to assume an extra disk read for each entry that is retrieved.

Repacking a Detail Dataset

You can often improve performance by repacking a detail dataset. The repacking process usually groups all records along their “primary path” values. In other words, all the records for a customer are put into consecutive locations (if customer ID is the primary path). Depending on the chain length, a single disk I/O may read all the appropriate records. The repacking default in Adager’s Detpack and DBGeneral’s dataset reorganization feature is along the primary path. For various reasons, you may not want or be able to change the primary path. If that is the case, both products give you the option to use any other path in the dataset.

Repacking removes the delete chain. It groups all the records at the beginning of the dataset without leaving empty locations in the middle of blocks.

Because most tools repack a dataset along the primary path, it is essential to pick the right one. A primary path should be the one with the longest average chain length (you can get this figure from Robelle’s How Messy report), that is also accessed the most frequently.

There is nothing to gain by selecting as your primary path a path with an average chain length of 1, since there is never another record with the same key value to retrieve.

Repacking only improves statistics for the specific path that is repacked. Other paths will usually not be improved in efficiency by a repacking.

For more information on how the internal mechanisms of IMAGE impact performance, read this Robelle tutorial “IMAGE Internals and Performance” at www.robelle.com/library/tutorials/pdfs/imgperf.pdf

Copyright The 3000 NewsWire. All rights reserved.