Google
 

Trailing-Edge - PDP-10 Archives - decus_20tap2_198111 - decus/20-0041/kwic.mem
There are 2 other files named kwic.mem in the archive. Click here to see a list.


     KWIC.RNO%1-1[5 Nov 1977]/HDT


                                        KWIC
                         Key-Word-In-Context Listing Program


     DESCRIPTION

          KWIC produces a listing of a file in which the key words are listed in
     alphabetical order within the context of the lines in which the words
     appear.  The listing would be useful for maintaining listings of
     bibliographies by author and subject or for maintaining listings of any
     kind of a catalog.


     USER INSTRUCTIONS

          The program uses two input files:  a master file which contains the
     lines to be keyword-indexed and a "stop" file which contains words which
     are not to be considered keywords.  I.e., keywords are those words in the
     master file which are not listed in the "stop" file.  A default file with
     common words such as "and", "the", "a", "an", etc., is available for those
     who don't need a special stop file.

          The program produces two output files:  a file which contains the
     listing and a file which lists the frequency of occurrence of the keywords.

          The default device for all files is DSK:.

          To use KWIC, create the master file according to the instructions
     below and optionally create the "stop" file according to the instructions
     below.  Then type the command:

          .L KWIC

     If the program file is not on the library area, see the
     programmer/consultant.

          KWIC will start by requesting the name of the master file:

          MASTER FILE:

     This filename is required.  Type in the name of the file and extension of
     the file (note that all filenames must be typed as uppercase characters!).

          KWIC will request the name of the "stop" file:

          STOP FILE:

     Supply the name and extension of your stop file if you have one.  The
     default extension is ".STP".  If you do not want to create your own stop
     file, just type a carriage return to use the default stop file, KWIC.STP,
     which is available from the library area.
     KWIC.RNO%1-1[5 Nov 1977]/HDT                                         Page 2
     User Instructions


          KWIC will ask for the name of the listing (index) file:

          INDEX FILE:

     If you type a carriage return, KWIC will give the listing file the name of
     your master file and the extension of ".NDX".  If this is not acceptable,
     type in your own filename and extension.  (Default extension is ".NDX".)

          KWIC will now ask for the name of the output frequency file:

          FREQUENCY FILE:

     Again, if you type a carriage return, KWIC will give the fequency file the
     name of your master file and the extension ".FRQ".  If this is not
     acceptable, type in your own filename and extension.  (Default extension is
     ".FRQ".)

          Finally, KWIC asks for a listing title:

          LISTING TITLE:

     The title typed in will be printed at the top of each page.  It can be up
     to 80 characters long.

          KWIC now reads in the stop file and informs you of the core required,
     then reads in the master file and informs you of the additional core
     required.  Finally, KWIC generates the listing and frequency files.

          If an error is detected in the master file, KWIC tries to indicate the
     line in error by typing its line number.  If an error occurs and your
     master file does not have sequence numbers, run your master file through
     SOS or LINED to generate line sequence numbers, then run KWIC again.
     Sequence numbers can be removed when the master file has been edited to be
     correct.


                                 Master File Format

          The master file consists of the data to be indexed by the KWIC
     program.  This may be any type of alphanumeric data.  For example, the data
     could be many book titles in a specific area of study or possibly a whole
     library's catalog.  But the program is flexible enough to allow KWIC
     indexing of a thesis paper or similar document.

          The lines of the file contain three or optionally four fields of data:

          1.  A standard DEC line sequence number (optional).

          2.  The field of data to be indexed.  This may be continued on any
              number of lines in the file itself.  That is, carriage return-line
              feeds are ignored within this field.  The field is terminated by
              an "=" character.  (Note that this means you can't have an "="
              character within the text of the data field!) The words in this
              field are the ones used to create the keyword-index listing.
     KWIC.RNO%1-1[5 Nov 1977]/HDT                                         Page 3
     User Instructions


              Words are strings of characters delimited by one or more spaces or
              tabs.

          3.  A field of auxiliary information which you want to be contained in
              the master file for listing purposes but over which you don't want
              indexing to be done.  For example, on a bibliographic listing, you
              might list the publisher name and copyright date in this field.
              (Optional)

          4.  The identification field.  This consists of a pair of "[" and "]"
              characters surrounding a field of up to 10 characters used to
              uniquely identify each item to be indexed.  Warning:  use no tabs
              or spaces in this field.  The "[" character starts this field and
              delimits the end of the auxiliary information field, and the "]"
              character delimits the end of the ID field and of the line to be
              indexed.



                                        NOTE

                    Since the carriage return-line feed is
                    ignored, to continue a word on another line
                    the user merely types the rest of the word
                    with no spaces.  But if he wishes to delimit
                    the word with a space he must type it either
                    at the end of the line or at the beginning of
                    the continuation line.  For example,

                         The Ohio State Univ
                         ersity

                    represents the continuation of the same word
                    across a line.  

                         The Ohio State
                          University

                    represents two separate words.



          There are two conventions which may help new users of KWIC organize
     their indices better.  The first is to place all author names (for example)
     in parentheses.  This will make all the author names appear in one group in
     the listing.  The second convention is to use a "/ " in front of any word
     which is not in the title but which is of value in classifying the index
     item.


                                  Stop File Format
     KWIC.RNO%1-1[5 Nov 1977]/HDT                                         Page 4
     User Instructions


          The stop file consists of words which the user feels have no value as
     index terms for his particular application.  One such stop list is
     available from the library as KWIC.STP.  But if a user finds that other
     "useless" terms appear in his listing, he may supplement the original stop
     list by creating his own.

          The stop list is a file of words, one word per line, which is sorted
     in alphabetical order.  The file may have standard DEC sequence numbers.
     Tabs and spaces are ignored in this file.

          To create a stop file of his own, the user may start by copying
     KWIC.STP from the library area.  It might be named KWIC.STP in the user's
     area, but any other name will do.  It is most convenient to use the
     extension ".STP", but that is not required.  Now edit that stop file and
     insert into it the additional words which are to be excluded from the
     keyword indexing.  Insert these words in alphabetical order (unless you
     know how to use the SORT program to reorder the file).  Delete from the
     stop file any words which you feel would be of use to you as keywords.

          Now when you run KWIC, give the name of this file as the STOP FILE.


     EXAMPLE

          The following file, named EXAMP.MAS, was created as a disk file:


     (McCracken) A Guide to ALGOL Programming=
             Wiley, 1967 [M-101]
     (McCracken) A Guide to FORTRAN Programming=
             Wiley, 2nd edition, 1972 [M-102]
     (Didday) and (Page) FORTRAN for Humans=West, 1975[DP-101]
     (Meissner) The Science of Computing= Wadsworth, 1972 [M-103]
     (Murrill) and (Smith) Intro to Computer Science=
             IEP, 1970 [M-104]
     (Pizer) Numerical Computing and Math Analysis=SRA, 1974[P-101]


     The KWIC program was run with the following commands:

          .L KWIC

          KEY-WORD-IN-CONTEXT

          MASTER FILE: EXAMP.MAS
          STOP FILE: <typed RETURN key>
          INDEX FILE: <typed RETURN key>
          FREQUENCY FILE: <typed RETURN key>
          LISTING TITLE: EXAMPLE OF KWIC

     The listing file, EXAMP.NDX, which was created from this example file and
     command sequence was the following:
     KWIC.RNO%1-1[5 Nov 1977]/HDT                                         Page 5
     Example


     KEY-WORD-IN-CONTEXT VERSION 1 14:13 5-NOV-77                 PAGE 1
     CURRENT STOP LIST----EXAMPLE OF KWIC


             (a listing of stop words appears here)



     KEY-WORD-IN-CONTEXT VERSION 1 14:13 5-NOV-77                 PAGE 2
     KWIC INDEX---EXAMPLE OF KWIC


             (the following listing has been truncated on the
              left side for the purposes of printing this
              document)


                 (DIDDAY) AND (PAGE) FORTRAN FOR HUMANS                [DP-101
                 (MCCRACKEN) A GUIDE TO ALGOL PROGRAMMING              [M-101
                 (MCCRACKEN) A GUIDE TO FORTRAN PROGRAMMING            [M-102
                 (MEISSNER) THE SCIENCE OF COMPUTING                   [M-103
                 (MURRILL) AND (SMITH) INTRO TO COMPUTER SCIENCE       [M-104
     IDDAY) AND  (PAGE) FORTRAN FOR HUMANS                             [DP-101
                 (PIZER) NUMERICAL COMPUTING AND MATH ANALYSIS         [P-101
     RRILL) AND  (SMITH) INTRO TO COMPUTER SCIENCE                     [M-104
     A GUIDE TO  ALGOL PROGRAMMING                                     [M-101
     G AND MATH  ANALYSIS                                              [P-101
     ) INTRO TO  COMPUTER SCIENCE                                      [M-104
     SCIENCE OF  COMPUTING                                             [M-103
      NUMERICAL  COMPUTING AND MATH ANALYSIS                           [P-101
     A GUIDE TO  FORTRAN PROGRAMMING                                   [M-102
     AND (PAGE)  FORTRAN FOR HUMANS                                    [DP-101
     CRACKEN) A  GUIDE TO ALGOL PROGRAMMING                            [M-101
     CRACKEN) A  GUIDE TO FORTRAN PROGRAMMING                          [M-102
     ORTRAN FOR  HUMANS                                                [DP-101
     ND (SMITH)  INTRO TO COMPUTER SCIENCE                             [M-104
     PUTING AND  MATH ANALYSIS                                         [P-101
        (PIZER)  NUMERICAL COMPUTING AND MATH ANALYSIS                 [P-101
     E TO ALGOL  PROGRAMMING                                           [M-101
     TO FORTRAN  PROGRAMMING                                           [M-102
     SSNER) THE  SCIENCE OF COMPUTING                                  [M-103
     O COMPUTER  SCIENCE                                               [M-104
     KWIC.RNO%1-1[5 Nov 1977]/HDT                                         Page 6
     Restrictions


     RESTRICTIONS

          1.  The line size for the listing (.NDX) file is set at 132
              (line-printer width).  This can be changed as a parameter in the
              program source file.

          2.  The listing file contains 50 lines per page.  (Can be changed in
              the source program.)

          3.  The maximum number of characters per word is 50.  (Can be changed
              in the source program.)

          4.  Maximum number of occurrences of a keyword is 300.  (Can be
              changed in the source program.)

          5.  Size of the ID field is 10 characters.  (Can be changed in the
              source program.)

          6.  Words listed in the stop list will be truncated to 12 characters;
              all characters of the stop word are used in the processing,
              however.



     Error Messages

          1.  CANNOT INIT XXXXX DEVICE

              Device specified in an input parameter or a default specification
              was not correct or is unavailable to the user.

          2.  CANNOT FIND XXXXX FILE

              The file specifed ('XXXXX') could not be found.  Probably a
              mistype (though note that KWIC accepts only upper case letters for
              file names).

          3.  CANNOT ENTER XXXXX FILE

              The directory on the output device specifed for the 'XXXXX'
              listing was full.

          4.  ?READ ERROR ON 'XXXXX' DEVICE

              A device error occurred on the 'XXXXX' file while reading.
              Probably a hardware error.

          5.  ?WRITE ERROR ON 'XXXXX' FILE

              A device error occurred on the 'XXXXX' file while writing.
              Probably a hardware error.
     KWIC.RNO%1-1[5 Nov 1977]/HDT                                         Page 7
     Error Messages


          6.  ?MASTER FILE NO LONGER AVAILABLE

              The program releases the master file for a short period while it
              reads in the stop list file.  This is so on a DECtape system these
              two files may be on the same drive.  This error occurs when it
              looks for the file the second time (after the stop list has been
              read in) and cannot find it.  This should never occur.  Fatal
              error.

          7.  ?FATAL UUO FAILURE - BADFAL -

              A CORE UUO failed while deallocating core.  Should be an
              impossible situation.  Fatal error.

          8.  ?MAXIMUM WORD SIZE EXCEEDED
              WORD=CCCCCC

              A word longer than the length specified by the 'SIZWRD' parameter
              in the KWIC program (currently set at 50) was exceeded.  The
              string 'CCCCCC' will be the word in error.  Fatal error.

          9.  ?TOO MANY MATCHES FOR ARRAY
              WORD=CCCCCC

              More than the number of identical index items specified by the
              'MAXSAM' parameter in the KWIC source program was exceeded
              (currently set at 300).  The word 'CCCCCC' is the word which
              occurred an excessive number of times.  Fatal error.

         10.  ?CORE UUO FAILED--TRYING AGAIN

              If the CORE UUO fails while trying to read in data, this message
              is printed.  After a 30 second delay, the program will try again
              to allocate the core.  It continues looping until it gets the
              core.  This is useful for non-swapping systems where a user can
              wait for the core needed to be freed.  Should be considered a
              fatal error for a swapping system.

         11.  ?ERROR IN LINE NNNNN---

              The line NNNNN or one of the nearby lines is bad.  If the master
              input file does not have line sequence numbers, no number 'NNNNN'
              is printed.  In this case, the user should run SOS or LINED on the
              master file, then rerun KWIC to find the line with the error.  The
              syntax error types are listed below.  The errors are fatal.

              1.  ---I.D.  NUMBER TOO LONG

                  Means the size of the ID number is greater than the 'IDSIZ'
                  parameter in the source program (currently set at 10).

              2.  ---NO I.D.  NUMBER FOUND

                  A line is missing the ID field ("[" and "]" characters).
     KWIC.RNO%1-1[5 Nov 1977]/HDT                                         Page 8
     Error Messages


              3.  ---NO SORT DELIM FOUND

                  Missing an "=" on a line

              4.  ---SYNTAX ERROR

                  Undiagnosable error.  (Examine the line and preceding line
                  carefully.)




     ORIGIN

          The KWIC program was written by G.  B.  Moersdorf at Ohio State
     University.  It is available from the DECUS DEC-10/20 program library.


     ADDITIONAL DOCUMENTATION

          Additional, more technical documentation is available as KWIC.DOC from
     the DECUS library tapes.



     [End of KWIC.RNO]