Trailing-Edge
-
PDP-10 Archives
-
decuslib20-02
-
decus/20-0041/kwic.mem
There are 2 other files named kwic.mem in the archive. Click here to see a list.
KWIC.RNO%1-1[5 Nov 1977]/HDT
KWIC
Key-Word-In-Context Listing Program
DESCRIPTION
KWIC produces a listing of a file in which the key words are listed in
alphabetical order within the context of the lines in which the words
appear. The listing would be useful for maintaining listings of
bibliographies by author and subject or for maintaining listings of any
kind of a catalog.
USER INSTRUCTIONS
The program uses two input files: a master file which contains the
lines to be keyword-indexed and a "stop" file which contains words which
are not to be considered keywords. I.e., keywords are those words in the
master file which are not listed in the "stop" file. A default file with
common words such as "and", "the", "a", "an", etc., is available for those
who don't need a special stop file.
The program produces two output files: a file which contains the
listing and a file which lists the frequency of occurrence of the keywords.
The default device for all files is DSK:.
To use KWIC, create the master file according to the instructions
below and optionally create the "stop" file according to the instructions
below. Then type the command:
.L KWIC
If the program file is not on the library area, see the
programmer/consultant.
KWIC will start by requesting the name of the master file:
MASTER FILE:
This filename is required. Type in the name of the file and extension of
the file (note that all filenames must be typed as uppercase characters!).
KWIC will request the name of the "stop" file:
STOP FILE:
Supply the name and extension of your stop file if you have one. The
default extension is ".STP". If you do not want to create your own stop
file, just type a carriage return to use the default stop file, KWIC.STP,
which is available from the library area.
KWIC.RNO%1-1[5 Nov 1977]/HDT Page 2
User Instructions
KWIC will ask for the name of the listing (index) file:
INDEX FILE:
If you type a carriage return, KWIC will give the listing file the name of
your master file and the extension of ".NDX". If this is not acceptable,
type in your own filename and extension. (Default extension is ".NDX".)
KWIC will now ask for the name of the output frequency file:
FREQUENCY FILE:
Again, if you type a carriage return, KWIC will give the fequency file the
name of your master file and the extension ".FRQ". If this is not
acceptable, type in your own filename and extension. (Default extension is
".FRQ".)
Finally, KWIC asks for a listing title:
LISTING TITLE:
The title typed in will be printed at the top of each page. It can be up
to 80 characters long.
KWIC now reads in the stop file and informs you of the core required,
then reads in the master file and informs you of the additional core
required. Finally, KWIC generates the listing and frequency files.
If an error is detected in the master file, KWIC tries to indicate the
line in error by typing its line number. If an error occurs and your
master file does not have sequence numbers, run your master file through
SOS or LINED to generate line sequence numbers, then run KWIC again.
Sequence numbers can be removed when the master file has been edited to be
correct.
Master File Format
The master file consists of the data to be indexed by the KWIC
program. This may be any type of alphanumeric data. For example, the data
could be many book titles in a specific area of study or possibly a whole
library's catalog. But the program is flexible enough to allow KWIC
indexing of a thesis paper or similar document.
The lines of the file contain three or optionally four fields of data:
1. A standard DEC line sequence number (optional).
2. The field of data to be indexed. This may be continued on any
number of lines in the file itself. That is, carriage return-line
feeds are ignored within this field. The field is terminated by
an "=" character. (Note that this means you can't have an "="
character within the text of the data field!) The words in this
field are the ones used to create the keyword-index listing.
KWIC.RNO%1-1[5 Nov 1977]/HDT Page 3
User Instructions
Words are strings of characters delimited by one or more spaces or
tabs.
3. A field of auxiliary information which you want to be contained in
the master file for listing purposes but over which you don't want
indexing to be done. For example, on a bibliographic listing, you
might list the publisher name and copyright date in this field.
(Optional)
4. The identification field. This consists of a pair of "[" and "]"
characters surrounding a field of up to 10 characters used to
uniquely identify each item to be indexed. Warning: use no tabs
or spaces in this field. The "[" character starts this field and
delimits the end of the auxiliary information field, and the "]"
character delimits the end of the ID field and of the line to be
indexed.
NOTE
Since the carriage return-line feed is
ignored, to continue a word on another line
the user merely types the rest of the word
with no spaces. But if he wishes to delimit
the word with a space he must type it either
at the end of the line or at the beginning of
the continuation line. For example,
The Ohio State Univ
ersity
represents the continuation of the same word
across a line.
The Ohio State
University
represents two separate words.
There are two conventions which may help new users of KWIC organize
their indices better. The first is to place all author names (for example)
in parentheses. This will make all the author names appear in one group in
the listing. The second convention is to use a "/ " in front of any word
which is not in the title but which is of value in classifying the index
item.
Stop File Format
KWIC.RNO%1-1[5 Nov 1977]/HDT Page 4
User Instructions
The stop file consists of words which the user feels have no value as
index terms for his particular application. One such stop list is
available from the library as KWIC.STP. But if a user finds that other
"useless" terms appear in his listing, he may supplement the original stop
list by creating his own.
The stop list is a file of words, one word per line, which is sorted
in alphabetical order. The file may have standard DEC sequence numbers.
Tabs and spaces are ignored in this file.
To create a stop file of his own, the user may start by copying
KWIC.STP from the library area. It might be named KWIC.STP in the user's
area, but any other name will do. It is most convenient to use the
extension ".STP", but that is not required. Now edit that stop file and
insert into it the additional words which are to be excluded from the
keyword indexing. Insert these words in alphabetical order (unless you
know how to use the SORT program to reorder the file). Delete from the
stop file any words which you feel would be of use to you as keywords.
Now when you run KWIC, give the name of this file as the STOP FILE.
EXAMPLE
The following file, named EXAMP.MAS, was created as a disk file:
(McCracken) A Guide to ALGOL Programming=
Wiley, 1967 [M-101]
(McCracken) A Guide to FORTRAN Programming=
Wiley, 2nd edition, 1972 [M-102]
(Didday) and (Page) FORTRAN for Humans=West, 1975[DP-101]
(Meissner) The Science of Computing= Wadsworth, 1972 [M-103]
(Murrill) and (Smith) Intro to Computer Science=
IEP, 1970 [M-104]
(Pizer) Numerical Computing and Math Analysis=SRA, 1974[P-101]
The KWIC program was run with the following commands:
.L KWIC
KEY-WORD-IN-CONTEXT
MASTER FILE: EXAMP.MAS
STOP FILE: <typed RETURN key>
INDEX FILE: <typed RETURN key>
FREQUENCY FILE: <typed RETURN key>
LISTING TITLE: EXAMPLE OF KWIC
The listing file, EXAMP.NDX, which was created from this example file and
command sequence was the following:
KWIC.RNO%1-1[5 Nov 1977]/HDT Page 5
Example
KEY-WORD-IN-CONTEXT VERSION 1 14:13 5-NOV-77 PAGE 1
CURRENT STOP LIST----EXAMPLE OF KWIC
(a listing of stop words appears here)
KEY-WORD-IN-CONTEXT VERSION 1 14:13 5-NOV-77 PAGE 2
KWIC INDEX---EXAMPLE OF KWIC
(the following listing has been truncated on the
left side for the purposes of printing this
document)
(DIDDAY) AND (PAGE) FORTRAN FOR HUMANS [DP-101
(MCCRACKEN) A GUIDE TO ALGOL PROGRAMMING [M-101
(MCCRACKEN) A GUIDE TO FORTRAN PROGRAMMING [M-102
(MEISSNER) THE SCIENCE OF COMPUTING [M-103
(MURRILL) AND (SMITH) INTRO TO COMPUTER SCIENCE [M-104
IDDAY) AND (PAGE) FORTRAN FOR HUMANS [DP-101
(PIZER) NUMERICAL COMPUTING AND MATH ANALYSIS [P-101
RRILL) AND (SMITH) INTRO TO COMPUTER SCIENCE [M-104
A GUIDE TO ALGOL PROGRAMMING [M-101
G AND MATH ANALYSIS [P-101
) INTRO TO COMPUTER SCIENCE [M-104
SCIENCE OF COMPUTING [M-103
NUMERICAL COMPUTING AND MATH ANALYSIS [P-101
A GUIDE TO FORTRAN PROGRAMMING [M-102
AND (PAGE) FORTRAN FOR HUMANS [DP-101
CRACKEN) A GUIDE TO ALGOL PROGRAMMING [M-101
CRACKEN) A GUIDE TO FORTRAN PROGRAMMING [M-102
ORTRAN FOR HUMANS [DP-101
ND (SMITH) INTRO TO COMPUTER SCIENCE [M-104
PUTING AND MATH ANALYSIS [P-101
(PIZER) NUMERICAL COMPUTING AND MATH ANALYSIS [P-101
E TO ALGOL PROGRAMMING [M-101
TO FORTRAN PROGRAMMING [M-102
SSNER) THE SCIENCE OF COMPUTING [M-103
O COMPUTER SCIENCE [M-104
KWIC.RNO%1-1[5 Nov 1977]/HDT Page 6
Restrictions
RESTRICTIONS
1. The line size for the listing (.NDX) file is set at 132
(line-printer width). This can be changed as a parameter in the
program source file.
2. The listing file contains 50 lines per page. (Can be changed in
the source program.)
3. The maximum number of characters per word is 50. (Can be changed
in the source program.)
4. Maximum number of occurrences of a keyword is 300. (Can be
changed in the source program.)
5. Size of the ID field is 10 characters. (Can be changed in the
source program.)
6. Words listed in the stop list will be truncated to 12 characters;
all characters of the stop word are used in the processing,
however.
Error Messages
1. CANNOT INIT XXXXX DEVICE
Device specified in an input parameter or a default specification
was not correct or is unavailable to the user.
2. CANNOT FIND XXXXX FILE
The file specifed ('XXXXX') could not be found. Probably a
mistype (though note that KWIC accepts only upper case letters for
file names).
3. CANNOT ENTER XXXXX FILE
The directory on the output device specifed for the 'XXXXX'
listing was full.
4. ?READ ERROR ON 'XXXXX' DEVICE
A device error occurred on the 'XXXXX' file while reading.
Probably a hardware error.
5. ?WRITE ERROR ON 'XXXXX' FILE
A device error occurred on the 'XXXXX' file while writing.
Probably a hardware error.
KWIC.RNO%1-1[5 Nov 1977]/HDT Page 7
Error Messages
6. ?MASTER FILE NO LONGER AVAILABLE
The program releases the master file for a short period while it
reads in the stop list file. This is so on a DECtape system these
two files may be on the same drive. This error occurs when it
looks for the file the second time (after the stop list has been
read in) and cannot find it. This should never occur. Fatal
error.
7. ?FATAL UUO FAILURE - BADFAL -
A CORE UUO failed while deallocating core. Should be an
impossible situation. Fatal error.
8. ?MAXIMUM WORD SIZE EXCEEDED
WORD=CCCCCC
A word longer than the length specified by the 'SIZWRD' parameter
in the KWIC program (currently set at 50) was exceeded. The
string 'CCCCCC' will be the word in error. Fatal error.
9. ?TOO MANY MATCHES FOR ARRAY
WORD=CCCCCC
More than the number of identical index items specified by the
'MAXSAM' parameter in the KWIC source program was exceeded
(currently set at 300). The word 'CCCCCC' is the word which
occurred an excessive number of times. Fatal error.
10. ?CORE UUO FAILED--TRYING AGAIN
If the CORE UUO fails while trying to read in data, this message
is printed. After a 30 second delay, the program will try again
to allocate the core. It continues looping until it gets the
core. This is useful for non-swapping systems where a user can
wait for the core needed to be freed. Should be considered a
fatal error for a swapping system.
11. ?ERROR IN LINE NNNNN---
The line NNNNN or one of the nearby lines is bad. If the master
input file does not have line sequence numbers, no number 'NNNNN'
is printed. In this case, the user should run SOS or LINED on the
master file, then rerun KWIC to find the line with the error. The
syntax error types are listed below. The errors are fatal.
1. ---I.D. NUMBER TOO LONG
Means the size of the ID number is greater than the 'IDSIZ'
parameter in the source program (currently set at 10).
2. ---NO I.D. NUMBER FOUND
A line is missing the ID field ("[" and "]" characters).
KWIC.RNO%1-1[5 Nov 1977]/HDT Page 8
Error Messages
3. ---NO SORT DELIM FOUND
Missing an "=" on a line
4. ---SYNTAX ERROR
Undiagnosable error. (Examine the line and preceding line
carefully.)
ORIGIN
The KWIC program was written by G. B. Moersdorf at Ohio State
University. It is available from the DECUS DEC-10/20 program library.
ADDITIONAL DOCUMENTATION
Additional, more technical documentation is available as KWIC.DOC from
the DECUS library tapes.
[End of KWIC.RNO]