Review by Shawn Gordon
At its heart, Discover/3000 is a
sophisticated search tool. You can search through any type of file,
including an image dataset, looking for various types of data.
Discover is very big on allowing you to create custom pattern matches
for both inclusion and exclusion to facilitate your search.
Discover/3000 is an extension of a Y2K tool that Impact Digital
Solutions sold in the US and Europe during the past few years. That
Y2K mission shows in the commands and focus of the product.
Thats not to say it isnt useful, but just to give you a
background on its origin.
With a few simple commands, you can locate
all of the records that contain references to search strings which
are user defined. You can define up to 32 different wildcard
character strings and up to 64 excluded strings. You
dont want to see VALIDATE when you are looking for DATE. You
can scan a variety of file types, either alone or as a group. There
is a limitation in Discover that limits its ability to scan a file
that is up to 1024 bytes. Both summary and detail reports are
produced based on the result of the search.
Discover automatically recognizes COBOL,
JOB, and COPYLIB source files. The text-matching capabilities can be
used to analyze documents, source code, JOB streams, UDCs, command
files, and pretty much any other text-based file. Discover can also
locate specific text references such as names, cities, countries,
depot codes, distribution centers, etc.
Discover can also scan IMAGE datasets to
identify the data items that contain the dates or patterns which are
user definable. You can analyze an entire IMAGE database in one pass
or just one set at a time. Over 100 date formats and types are
recognized, which is a heck of a lot more than I can think of.
Empty dates and patterns and specially
coded dates (such as all 9s) are also recognized. You are also to
define what is empty or special for each type and format of date. And
finally, you can specify a percent certainty for a date or pattern to
be considered valid. This kind of fuzzy logic is pretty slick.
You can validate and extract data from MPE
files and IMAGE datasets that either match or do not match a
user-defined pattern or date. Then you can convert and reformat dates
in over 100 different formats and data types. Discover also has a
PREVIEW mode that allows you to do a dry run on your file to check
the results prior to an actual date conversion.
How does it work?
Discover is a command line application
that works rather like Robelles Suprtool or Warehouse. That
means you set up all your pattern matches, parameters, and search
criteria, then execute it. You can also save the script so that it is
executable again and again. There is a nifty little Reflection Basic
program that will allow you to click and pick keywords from a pop-up
window, but since I dont have or use Reflection, I
couldnt test this feature.
There arent any magic tricks going
on in Discover like MR NOBUF IO or special sort routines but
because of the nature of the scripting engine its about as fast
as a program written in a 3GL. The Discover interpreter is also a
process-handling environment, so you can issue MPE commands inside
the tool. Not only can you issue MPE commands, but they can be saved
and used as part of a script; you just have to preface the command
with a colon. Below is an example:
use extract.script
SCRIPT &n
bsp;
HEAD &nbs
p;
* SCRIPT to build a file of the names of all
of the extract files
from
* the STRING SEARCH commands. User is
prompted for the type of
source
****************** define SCRIPT items
******************
IT FNAME
X26 &
nbsp;
IT STYPE
X08 &
nbsp;
IT SOURCETYPE
X08 &
nbsp;
:ECHO Script to build a file of the names of
all of the the
D3K
:ECHO EXTRACT files of a particular
type
:PURGE XLIST
>$null &
nbsp;
:PURGE YLIST,TEMP
>$null &
nbsp;
* The input file must be
permanent &
nbsp;
:BUILD
XLIST;REC=-36,,F,ASCII;DISC=10000 &
nbsp;
:FILE
XLIST=XLIST,OLD &
nbsp;
***************** get the desired source type
****************
:SETVAR SOURCETYPE
"COBOL"  
;
:INPUT SOURCETYPE;PROMPT="Enter SOURCE
type of extract files [COBOL]
"
:SETVAR SOURCETYPE
LTRIM(UPS("!SOURCETYPE
"))
:SETVAR SPACEPOS
0 &nb
sp;
:SETVAR SPACEPOS POS("
","!SOURCETYPE",1) &
nbsp;
:IF SPACEPOS >
0 &nb
sp;
: SETVAR SOURCETYPE
STR("!SOURCETYPE",1,SPACEPOS-1)
:ENDIF &n
bsp;
:ECHO Looking for !SOURCETYPE extract
files
Features
I was struck how similar the settings
syntax is to Quiz. It could just be a coincidence, but here is a
little snippet so you can judge for yourself.
DEFINE MACHINE HP927LX
DEFINE COMPANY
DEFINE SLIST SLIST.IDSINC
DEFINE DLIST DLIST.IDSINC
DEFINE PLIST PLIST.IDSINC
DEFINE FLIST FLIST.IDSINC
DEFINE EXTRACT EXTRACT.IDSINC
SET PROMPT YES
SET PREVIEW NO
DATES ASCII BINARY PACKED
SET DELIMITER /
SET ASCII SAME
SET PACKED SAME
MATCH FIRST
LOG
LOG EMPTY
LOG SPECIAL
ANLR1 19 62/99
CCYYRANGE 1962/1999
CNVR1 19 10/99
CNVR2 20 00/09
The pattern match is the heart and
strength of Discover. With this you can specify all sorts of fun
things to look for, as well as ranking for probability of a match.
To locate telephone numbers:
DEFINE PATTERN (###)###-####
DEFINE PATTERN (###) ###-####
DEFINE PATTERN ###-###-####
DEFINE PATTERN 1-###-###-####
DEFINE PATTERN ###.###.####
DEFINE PATTERN (###)###.####
To locate social security numbers or tax
IDs: DEFINE PATTERN ###-##-#### or DEFINE PATTERN ##-#######
You can almost think of these as COBOL
edit masks, but in this case you can find data that conforms to the
pattern you have defined. A couple of other examples are the Pattern
Match Files (PMF) for MPE files and Pattern Match Sets (PMS) for
Image Data Sets.
PMF AT@.D@.P@ 75 P would search through
the file set AT@.D@.P@ looking for a 75 percent certainty for
patterns that were defined. I found the certainty percentage a little
confusing. To quote from the manual:
The PATTERN MATCH FILES and PATTERN
MATCH SETS commands require you to specify a percent certainty that
represents the ratio of valid dates or patterns to all dates or
patterns at a specific location in a data file or data set. This
percent allows Discover/3000 to locate dates and patterns where some
or even most of the data is empty or invalid. It is expressed as a
percent. A high value for certainty may not locate valid date fields
if some of the dates are invalid, whereas a low value may result in
false locating of dates. The DATE ranges and the
certainty must be defined to produce the best results.
A good percent to start with is 80.
For DATES: CERTAINTY = (# valid dates) / (# non-empty dates) For
PATTERNS: CERTAINTY = (# valid patterns) / (# non-empty
patterns).
While this seems like an interesting
feature, I wasnt entirely sure how to apply it to my tests.
One of the more powerful, and labor
intensive, features of Discover is the DISCRIPT scripting language.
Its syntax and explanation looks like this:
SCRIPT
HEADER
Header commands
BODY
Body commands
TRAILER
Trailer commands
ENDSCRIPT
The HEADER commands define the data items,
the input file, and the output file. They can also write records to
the output file before processing the records in the input file. You
could use this as a report writer-type feature if you wanted. The
maximum record size of the output file is 1024 bytes. This section is
where you would put your file equates and the :BUILD or :COPY command
to create the output file. The output file MUST exist prior to the
first WRITE statement.
BODY commands are processed for each
record in an input file. The records are written to the output file
depending on the logic of the BODY commands. The command to write an
output record is, logically enough, WRITE. The output file record is
created via NEWREC commands. Multiple output records or no output
records may be produced for each input record.
I dont want to regurgitate the
entire scripting manual, but there is support for variable
declarations and conditional constricts (IF..ELSE..ENDIF), as well as
for string manipulation for changing the value of items and updating
items. There are a number of predefined variables such as $DATE and
$TIME.
Installation and Documentation
Installation is simple and
straightforward. Restore the installation job and stream it
this will create the accounting structure and restore the files, and
then very kindly cleans itself up.
There are two manuals. The user guide
weighs in at 175 pages, and the scripting guide is 45 pages. Both are
well-written and easy to use. The user guide is organized and has a
table of contents and such, while the scripting guide is more of an
alphabetized reference manual with some samples at the back.
There was one nit that I would pick with
Discover. Its first version has some help tailored only for the WRQ
terminal emulator user. The manual recommends you set your screen
display to 126 rows by 36 columns and gives directions for doing it
in Reflection. They also include an RBA (Reflection Basic) script
that allows you to pop in Discover keywords from a pick list. While
this is a nice touch, it again leaves out the huge number of people
using other emulators.
The Test Drive
The paradigm employed by Discover is a
little different that what Im used to, but it only took a short
time to become familiar with it. I spent some time scanning through
my source code and databases looking for particular records. Since
one of my projects is an e-mail system, I was able to do lots of date
and pattern matching with this type of data.
Since I didnt have any other
real-world problems to solve, I mostly just did some random research
to test the various features of the product. Everything worked as
advertised, and performance was just fine
Conclusions
I was a little skeptical of Discover at
first, because it was unclear to me what it was trying to be. After
spending some time with it I came away with two basic impressions.
First: there are an awful lot of arbitrary limitations in the product
like the number of search and exclusion parameters, record width of a
file being read, and limitation on the length of commands in a job
stream. I think its not that much harder to effectively remove
those limitations.
That said, Discover is one of those tools
that is probably a hard sell because you dont really realize
how useful it is until you become accustomed to using it and take
advantage of it to solve your problems. There is a significant amount
of power and versatility in Discover, but its not something
that everyone is going to need. This products ability to do
sophisticated text and data searches, as well as data
transformations, is a pretty slick thing to have. I suggest you spend
some time and analyze what kinds of work you do for troubleshooting.
It might be that Discover can help you in your development and
maintenance of your HP 3000.