NewsWire TestDrive: Discover a new tool to search 3000 data and source

Adager Sponsor Message

Discover/3000 version 1.24
Impact Digital Solutions
130 Bradford St.
San Francisco, Ca. 94110
Phone: 415.642.8015
Fax: 415.282.1947
e-mail: info@idswest.com
Web: www.idswest.com

Discover/3000 includes all the software required to run on your HP e3000. If you have WRQ Reflection you can also use their RBS file for a command picklist. A diskette is included that contains MS Word copies of the manual. A free 30-day trial of the software is available at the company’s Web site.

Discover/3000 runs on all versions of MPE/iX. Pricing on the software for the HP e3000 is per copy, the first copy is $1,800; additional copies are $900. Support is $325 and $160 respectively price per year and includes phone-in and electronic support and new releases of the software. All prices are in US dollars.

July 2000
Discover a tool to search 3000 data and source

New Discover/3000 uses fuzzy logic to sharpen MPE programmers’ results

Review by Shawn Gordon

At its heart, Discover/3000 is a sophisticated search tool. You can search through any type of file, including an image dataset, looking for various types of data. Discover is very big on allowing you to create custom pattern matches for both inclusion and exclusion to facilitate your search. Discover/3000 is an extension of a Y2K tool that Impact Digital Solutions sold in the US and Europe during the past few years. That Y2K mission shows in the commands and focus of the product. That’s not to say it isn’t useful, but just to give you a background on its origin.

With a few simple commands, you can locate all of the records that contain references to search strings which are user defined. You can define up to 32 different wildcard character strings and up to 64 “excluded” strings. You don’t want to see VALIDATE when you are looking for DATE. You can scan a variety of file types, either alone or as a group. There is a limitation in Discover that limits its ability to scan a file that is up to 1024 bytes. Both summary and detail reports are produced based on the result of the search.

Discover automatically recognizes COBOL, JOB, and COPYLIB source files. The text-matching capabilities can be used to analyze documents, source code, JOB streams, UDCs, command files, and pretty much any other text-based file. Discover can also locate specific text references such as names, cities, countries, depot codes, distribution centers, etc.

Discover can also scan IMAGE datasets to identify the data items that contain the dates or patterns which are user definable. You can analyze an entire IMAGE database in one pass or just one set at a time. Over 100 date formats and types are recognized, which is a heck of a lot more than I can think of.

Empty dates and patterns and specially coded dates (such as all 9s) are also recognized. You are also to define what is empty or special for each type and format of date. And finally, you can specify a percent certainty for a date or pattern to be considered valid. This kind of fuzzy logic is pretty slick.

You can validate and extract data from MPE files and IMAGE datasets that either match or do not match a user-defined pattern or date. Then you can convert and reformat dates in over 100 different formats and data types. Discover also has a PREVIEW mode that allows you to do a dry run on your file to check the results prior to an actual date conversion.

How does it work?

Discover is a command line application that works rather like Robelle’s Suprtool or Warehouse. That means you set up all your pattern matches, parameters, and search criteria, then execute it. You can also save the script so that it is executable again and again. There is a nifty little Reflection Basic program that will allow you to click and pick keywords from a pop-up window, but since I don’t have or use Reflection, I couldn’t test this feature.

There aren’t any magic tricks going on in Discover like MR NOBUF IO or special sort routines — but because of the nature of the scripting engine it’s about as fast as a program written in a 3GL. The Discover interpreter is also a process-handling environment, so you can issue MPE commands inside the tool. Not only can you issue MPE commands, but they can be saved and used as part of a script; you just have to preface the command with a colon. Below is an example:

use extract.script
SCRIPT                                        &n bsp;
HEAD                                        &nbs p;
* SCRIPT to build a file of the names of all of the extract files from
* the STRING SEARCH commands. User is prompted for the type of source
****************** define SCRIPT items ******************
IT FNAME X26                                          & nbsp;
IT STYPE X08                                          & nbsp;
IT SOURCETYPE X08                                          & nbsp;
:ECHO Script to build a file of the names of all of the the D3K
:ECHO EXTRACT files of a particular type
:PURGE XLIST >$null                                         & nbsp;
:PURGE YLIST,TEMP >$null                                         & nbsp;
* The input file must be permanent                                         & nbsp;
:BUILD XLIST;REC=-36,,F,ASCII;DISC=10000                                     & nbsp;
:FILE XLIST=XLIST,OLD                                        & nbsp;
***************** get the desired source type ****************
:SETVAR SOURCETYPE "COBOL"                                         ;
:INPUT SOURCETYPE;PROMPT="Enter SOURCE type of extract files [COBOL] "
:SETVAR SOURCETYPE LTRIM(UPS("!SOURCETYPE "))
:SETVAR SPACEPOS 0                                          &nb sp;
:SETVAR SPACEPOS POS(" ","!SOURCETYPE",1)                                     & nbsp;
:IF SPACEPOS > 0                                          &nb sp;
: SETVAR SOURCETYPE STR("!SOURCETYPE",1,SPACEPOS-1)
:ENDIF                                        &n bsp;
:ECHO Looking for !SOURCETYPE extract files

Features

I was struck how similar the settings syntax is to Quiz. It could just be a coincidence, but here is a little snippet so you can judge for yourself.

DEFINE MACHINE HP927LX
DEFINE COMPANY
DEFINE SLIST SLIST.IDSINC
DEFINE DLIST DLIST.IDSINC
DEFINE PLIST PLIST.IDSINC
DEFINE FLIST FLIST.IDSINC
DEFINE EXTRACT EXTRACT.IDSINC
SET PROMPT YES
SET PREVIEW NO
DATES ASCII BINARY PACKED
SET DELIMITER /
SET ASCII SAME
SET PACKED SAME
MATCH FIRST
LOG
LOG EMPTY
LOG SPECIAL
ANLR1 19 62/99
CCYYRANGE 1962/1999
CNVR1 19 10/99
CNVR2 20 00/09

The pattern match is the heart and strength of Discover. With this you can specify all sorts of fun things to look for, as well as ranking for probability of a match.

To locate telephone numbers:

DEFINE PATTERN (###)###-####
DEFINE PATTERN (###) ###-####
DEFINE PATTERN ###-###-####
DEFINE PATTERN 1-###-###-####
DEFINE PATTERN ###.###.####
DEFINE PATTERN (###)###.####

To locate social security numbers or tax IDs: DEFINE PATTERN ###-##-#### or DEFINE PATTERN ##-#######

You can almost think of these as COBOL edit masks, but in this case you can find data that conforms to the pattern you have defined. A couple of other examples are the Pattern Match Files (PMF) for MPE files and Pattern Match Sets (PMS) for Image Data Sets.

PMF AT@.D@.P@ 75 P would search through the file set AT@.D@.P@ looking for a 75 percent certainty for patterns that were defined. I found the certainty percentage a little confusing. To quote from the manual:

“The PATTERN MATCH FILES and PATTERN MATCH SETS commands require you to specify a percent certainty that represents the ratio of valid dates or patterns to all dates or patterns at a specific location in a data file or data set. This percent allows Discover/3000 to locate dates and patterns where some or even most of the data is empty or invalid. It is expressed as a percent. A high value for certainty may not locate valid date fields if some of the dates are invalid, whereas a low value may result in “false’ locating of dates. The DATE ranges and the certainty must be defined to produce the best results.

“A good percent to start with is 80. For DATES: CERTAINTY = (# valid dates) / (# non-empty dates) For PATTERNS: CERTAINTY = (# valid patterns) / (# non-empty patterns).”

While this seems like an interesting feature, I wasn’t entirely sure how to apply it to my tests.

One of the more powerful, and labor intensive, features of Discover is the DISCRIPT scripting language. Its syntax and explanation looks like this:

SCRIPT
HEADER
Header commands
BODY
Body commands
TRAILER
Trailer commands
ENDSCRIPT

The HEADER commands define the data items, the input file, and the output file. They can also write records to the output file before processing the records in the input file. You could use this as a report writer-type feature if you wanted. The maximum record size of the output file is 1024 bytes. This section is where you would put your file equates and the :BUILD or :COPY command to create the output file. The output file MUST exist prior to the first WRITE statement.

BODY commands are processed for each record in an input file. The records are written to the output file depending on the logic of the BODY commands. The command to write an output record is, logically enough, WRITE. The output file record is created via NEWREC commands. Multiple output records or no output records may be produced for each input record.

I don’t want to regurgitate the entire scripting manual, but there is support for variable declarations and conditional constricts (IF..ELSE..ENDIF), as well as for string manipulation for changing the value of items and updating items. There are a number of predefined variables such as $DATE and $TIME.

Installation and Documentation

Installation is simple and straightforward. Restore the installation job and stream it — this will create the accounting structure and restore the files, and then very kindly cleans itself up.

There are two manuals. The user guide weighs in at 175 pages, and the scripting guide is 45 pages. Both are well-written and easy to use. The user guide is organized and has a table of contents and such, while the scripting guide is more of an alphabetized reference manual with some samples at the back.

There was one nit that I would pick with Discover. Its first version has some help tailored only for the WRQ terminal emulator user. The manual recommends you set your screen display to 126 rows by 36 columns and gives directions for doing it in Reflection. They also include an RBA (Reflection Basic) script that allows you to pop in Discover keywords from a pick list. While this is a nice touch, it again leaves out the huge number of people using other emulators.

The Test Drive

The paradigm employed by Discover is a little different that what I’m used to, but it only took a short time to become familiar with it. I spent some time scanning through my source code and databases looking for particular records. Since one of my projects is an e-mail system, I was able to do lots of date and pattern matching with this type of data.

Since I didn’t have any other real-world problems to solve, I mostly just did some random research to test the various features of the product. Everything worked as advertised, and performance was just fine

Conclusions

I was a little skeptical of Discover at first, because it was unclear to me what it was trying to be. After spending some time with it I came away with two basic impressions. First: there are an awful lot of arbitrary limitations in the product like the number of search and exclusion parameters, record width of a file being read, and limitation on the length of commands in a job stream. I think it’s not that much harder to effectively remove those limitations.
That said, Discover is one of those tools that is probably a hard sell because you don’t really realize how useful it is until you become accustomed to using it and take advantage of it to solve your problems. There is a significant amount of power and versatility in Discover, but it’s not something that everyone is going to need. This product’s ability to do sophisticated text and data searches, as well as data transformations, is a pretty slick thing to have. I suggest you spend some time and analyze what kinds of work you do for troubleshooting. It might be that Discover can help you in your development and maintenance of your HP 3000.