Click here for OnmiSolutions sponsor message | |||||
Anatomy of an IMAGE
Database, Part 1
Knowing your
database parts keeps you from worrying
Having done tech support for Adager for 10 years, I know many users become extremely concerned (with good reason) when something goes awry with IMAGE. This concern generally stems from an overall lack of understanding about IMAGE, its data structures and methodologies. Knowing about these parts of the body of IMAGE can help resolve your concerns. Luckily, thanks to the initial work done by the original IMAGE team and the continuous enhancement of IMAGE inside HP, things do not go awry very often. In this set of articles we will go inside the IMAGE database and focus on the structure and methodology of IMAGE datasets. Two types of datasets In these articles I will address the two types of IMAGE datasets, masters and details. Although they are structurally similar in a global sense, there are many differences between them internally. The IMAGE dataset as an MPE file We will begin by using the MPE listf command with the ,2 option on your favorite database, which I will call DB. Type <listf db@,2> at the MPE prompt and you will see the collection of the database file structures, the root file (DB), and all the related datasets (DB01, DB02, DB03, etc.) that make the database complete. In the output along with the MPE file names are also the file code (always PRIV, a negative number, to prevent inadvertent modification or purging), the record size in words (16-bit words and always in multiples of 128 words) and the EOF and Limit. The root file is the key to the identity and contents of each individual dataset. Without the root file, the datasets are meaningless collections of bits. The quality of the relationship of the root file to the datasets is tested every time Adager opens a database. If you use Adager, you may be familiar with the Adager note: Consistency Checking after you enter the database name and press return. It is during this test that Adager reads the root file (DB) and crosschecks to make sure that all the corresponding datasets and data structures exist exactly as the root files definition of these structures. (For more information on the Adager consistency check, see Alfredo Regos paper on the Internet at www.adager.com in the Technical Papers section.) We target a particular detail dataset (in my example DB10), the 10th dataset in the database, and type <listf DB10,-3>. (See Figure 1 below.)
The first thing that you may notice is the FILE CODE: -401. The original IMAGE database specification had two File Codes, -400 for the root file and -401 for each dataset. Now, with all the enhancements to IMAGE, multiple other -400 series numbers are employed some for Omnidex indexes, others for Jumbo datasets, others for IMAGE/SQL data structures or IMAGE b-trees. For the purpose of this paper, we are only going to address -401 and its descriptor -400, the IMAGE root file. Across from the File Code is the datasets creator, which should be identical to the creator of the root file. In order to do any database repair, maintenance or alterations, you must be logged on as the creator of the database. For the course of this article and its second part next month, I will be addressing the IMAGE dataset as if I am logged on as the creator of the database (i.e., in the group and account of the root file with the user name matching the creators, just like the creator in the listf,-3 report). The tools well use to dissect DB10 will be the listf,2 and the Adager command report set. (See Figures 2 and 3 below.)
The output of the report set command begins with:
Adagers set report output begins with the set number (set 10), the files IMAGE name (INV-DTL), and its set type (Detail). The percent full, number of entries, and number of free entries are also displayed. Next in the report is the sets dimensional information:
The listf,2 output shows that DB10 has a record size of 1920 words; the IMAGE datasets block length (as seen in this report) is 1882 words. IMAGE blocks are stored in MPE records whose lengths will always be in multiples of 128 words (as created by DBUTIL and managed by Adager). The 128-word multiple record size which is large enough to contain this IMAGE block length (1882 words) is 1920 words. IMAGE stores anywhere from one to 255 entries in an IMAGE block/MPE record. An IMAGE block and an MPE record are synonymous, since the MPE record contains the IMAGE block. The number of IMAGE entries per block is called the blocking factor. In the case of DB10 we can see that the blocking factor is 11. Therefore, in this example, 11 IMAGE entries are contained in a single block. The IMAGE entry The Adager report set output includes a heading called the Field Layout. Fields contain the information that is meaningful to the user. The Field Layout is a listing of all the fields in each entry, their size in words, the field number, and the offset to the field. At the bottom of the field layout is a total length of all the fields in words; this is called the Entry Length. One of the strengths of IMAGE is its ability to quickly store, retrieve and modify information contained in the fields. Whats more, it does this with extreme efficiency. An IMAGE feature responsible for such rapid access is the path. Paths are physical and logical connections between master datasets and detail datasets. The connections exist to facilitate access to the records. The connections are based upon a particular field having common values in both the master and the detail the search field. When you look up a particular value in a detail via a search field, IMAGE has the ability to quickly locate the search field in the master. The master entry contains the search field value, count of corresponding detail entries with identical search field values, and the location of the first and last entries in the detail (backward and forward pointers). The detail entries each point at each other forward and backward, comprising with the chainhead in the master what is known as a chain. The collection of all chains related to the same search field in the dataset is called a path. An IMAGE dataset can have anywhere from zero to 16 paths. The Adager set report output in our example tells us that DB10 has two paths. Each path in a detail dataset requires four words (master datasets differ, and I will discuss them next issue). These four words are used for the backward pointer and the forward pointer (two words per pointer). IMAGE pointers in detail datasets point backward to the previous detail entry with the identical search field value and forward to the next detail entry with the identical search field value. The first entry in the chain has a backward pointer of zero, and the last entry in the chain has a forward pointer of zero. In the case of a single entry chain, both the backward pointer and forward pointer will be zero, the chainhead will have a count of 1, and the backward and forward pointers will be identical, pointing to the lone detail entry. This pointer information leads us to the IMAGE data structure called the media entry, which is also available to us in the report set listing. The media entry length in detail datasets is comprised of the entry length (sum of all the fields) plus the pointer information for each path. Navigating IMAGE chains is easy with the Adager command examine chain. Pointers In detail datasets, two words (32 bits) are used for the backward pointer and two words (32 bits) for the forward pointer. IMAGE stores the entry number in 32 bits. The 32-bit word is signed (HP has changed this to an unsigned value in the latest IMAGE release), which leaves 31 bits available for storing the address. Of the 31 available bits, 23 bits are used for denoting the block number and eight bits are used to denote the offset within the block. Therefore the maximum number of blocks that can exist in an IMAGE dataset is 8388607 [(2**23)-1] (zero based), and the maximum blocking factor is 255 [(2**8)-1]. This method of storing record numbers is called the IMAGE record name. Blocks A block is a collection of IMAGE entries. It can contain from one to 255 IMAGE entries, governed by the blocking factor. In DB10 we have an entry length of 163 (sum of all fields) and a media entry length of 171 (sum of all fields plus pointer info for two paths). The Adager set report reveals that the blocking factor is 11; IMAGE stores (11) 171 word records per block: 171 * 11 = 1881. But a review of the report set output tells us that the block length is 1882. The reason for this is that at the front (beginning at the zeroth word) of each block, IMAGE maintains a bit map. A bit map incorporates a 16-bit integer (one word) for each blocking factor value in multiples of 16. In the case of a blocking factor of 11, one word is all that is necessary, since the one word can accommodate a blocking factor from one to 16. If the blocking factor was 17 then two words would be required for the bit map, 33 and three words would be required, and so on. The bit map is the guide IMAGE uses to determine which records in the block are in use, the ones that are counted against the number of free entries in the dataset. In fact, IMAGE relies on the bit map to determine the existence of a record. If the bit is off (not in use), the record is considered to be non-existent. So the IMAGE block consists of the bit map plus the [media entry length * blocking factor]. This is stored in an MPE record which is the smallest multiple of 128 that can handle the size of the block. For 1882 this record is 1920 (note the listf parameter, size). The difference between the 1920 word MPE record and the 1882 word IMAGE block is 38 words, meaning that for every 11 IMAGE entries there are 38 wasted 16-bit words. The goal is to keep the waste to a minimum. The Adager command Reblock helps obtain the optimum blocking factor, but, despite our best efforts, there can be wasted disk space. In Part 2: Knowing the parts of an IMAGE dataset that can help manage database capacities.
|
|||||
|
|||||
Copyright 1998, The 3000 NewsWire. All rights reserved. |