|
June 2003 Boosting your e3000
productivity Is a
Zero Secondaries If secondary entries slow down a master dataset, would a zero secondaries be ideal? No! Just the opposite. If your
master dataset has no secondaries, it means that the key values are
not hashing randomly. Instead, they are hashing into some regular
pattern (usually into adjacent locations). Figure 1 below shows a
proper random dataset looks like, with P for primary and
S for secondary and what a dataset with no secondaries usually
looks like underneath it:
In the second example, all the entries are primaries and they are all clustered in one area, with no empty space between them. How could this happen? The reason for this pattern is usually binary keys and transaction numbers. For example, consider typical invoice numbers that start at 030001 and increase by one: 030002, 030003, 030004, etc. If these invoice numbers are stored in a Binary field (J2, I2, P8, etc.) instead of an Ascii field (X, Z), IMAGE disables random hashing and calculates the primary location as follows instead: Divide the rightmost 32 bits of the binary value by the dataset capacity, and use the remainder as the primary location (dividing by the capacity ensures that you produce an entry number between 0 and the capacity). If you try a few examples, you will see that monotonically increasing key values produce adjacent entry numbers. Assume that the capacity is 1000: 030001 => 1 What is so bad about that? It seems perfect. Each DBFIND or keyed DBGET will take a maximum of one disk read to find each key value! Very fast.
Unfortunately, the bad part happens when you change the
sequence of invoice numbers to 040001, 040002, 040003,... 040001 => 1 It is quite possible that the new invoice numbers will hash to the same location as the last batch. This causes a collision. The primary location is already occupied, and DBPUT must find a free space for a secondary entry. But the next physical entry is also occupied. In fact, there may be no free entries for quite a while. The result of this wraparound collision is that DBPUT may have to search thousands of disk pages. And the next DBPUT, will look at the original location + 1, then do the long search completely over again! If this happens to you, it can freeze your database and your entire application. What can you do? 1. Enable Master Dataset Expansion (MDX) on this dataset. I have never actually done this, but it should help quite a bit, because after searching a short time for an empty space, DBPUT will then go to the overflow area instead. However, this solution is not permanent. The overflow entries are on a single chain that will get longer and longer as you add entries, slowing down inquiries. Before that happens, you should implement one of the other solutions below. 2. Increase the capacity of the master dataset and hope that the clusters do not overlap with the new capacity. 3. Change the key field from a binary type to an ASCII type, such as X or Z. This of course means changing all the COBOL copylibs and recompiling all the programs, which may be quite time-consuming. 4. Convert the application to Eloquence on HP-UX. Since it does not have hashing, Eloquence does not have these problems. For more
articles on TurboIMAGE, go to www.robelle.com/tips/image.html
Copyright The 3000 NewsWire. All rights reserved. |