PDP-10 Archive: 2,5/klepto.hlp from BB-FI82B-DD

Trailing-Edge - PDP-10 Archives - BB-FI82B-DD_1989 - 2,5/klepto.hlp
There are 5 other files named klepto.hlp in the archive. Click here to see a list.
1.0  INTRODUCTION

     KLEPTO is a damage assesment program for  TOPS10  file  structures.
The  inspection  includes  tests  for  numerous types of errors, but the
emphasis is on SAT block errors.  KLEPTO will automatically correct  all
errors in the SAT blocks.

     KLEPTO is designed to be a replacement for DSKRAT and  RIPOFF.   It
is significantly faster than its predessors.



2.0  HOW TO RUN KLEPTO

Type:

.AS dskx STR
.AS dsky LPT
.R KLEPTO

Where "dskx" is the name of the file  structure  to  be  processed,  and
"dsky"  is the name of the structure to place the log file on.  The name
of the log file will be dskx.LOG.  Note that if logical  name  "LPT"  is
not  assigned,  then  the log file will go to TTY:.  Note that a spooled
LPT is not legal (or any other spooled device).



3.0  TYPES OF CORRUPTION

     When KLEPTO discovers a discrepancy  in  the  SAT  block,  it  will
categorize  it  into  one of three types:  lost clusters, free clusters,
and multiply used clusters.



3.1  Lost Clusters

     A lost cluster is where the SAT block says that the cluster  is  in
use,  but there is no file anywhere on the structure which has a pointer
to this cluster (i.e.  the SAT bit is one but should be zero).

     Of the three types, a lost cluster is  by  far  the  least  severe.
Under  normal  day  to  day  operations it is expected that all the file
structures on your system will slowly accumulate lost clusters.  This is
to  be considered perfectly normal.  You should not be alarmed even if a
structure has a large number of lost clusters.

     Consider the following scenario:  The file X.DAT is 5000 blocks.  A
user  attempts  to copy X.DAT to Y.DAT.  The first 3000 blocks have been
copied when the system accidentally crashes.  At the time of the  crash,
the  file  Y.DAT had not yet been closed.  Thus there is no entry in the
UFD.  The 3000 blocks which had already been  allocated  to  Y.DAT  will
become lost.

     This scenario is typical of how lost clusters  are  created.   They
are created when the system crashes during the creation of a file.  Each
crash will probably create at least one lost cluster.  Some crashes will
create more.



3.2  Free Clusters

     A free cluster is the exact opposite of a  lost  cluster.   A  free
cluster  is where the SAT block says that the cluster is not in use, but
there is, in fact, a file which uses the cluster (i.e.  the SAT  bit  is
zero but should be one).

     The existence of a free cluster is not be taken  lightly.   A  free
cluster  is a rather severe error.  The file which contains this cluster
is quite likely to be corrupt.  Although  KLEPTO  will  repair  the  SAT
block,  it  cannot  repair  the file itself.  To insure the integrity of
your data, we suggest that you restore the  file  from  a  BACKUP  tape.
KLEPTO will tell you which file(s) contain free clusters.



3.3  Multiply Used Clusters

     As the name implies, a "multiply  used  cluster"  is  one  that  is
pointed to by several files.  This is to be considered a serious error.

     Note that a given cluster can easily switch from the state of being
free  to  the  state  of  being  multiply  used (and vis a vis).  If the
monitor allocates a free cluster to some file, then the cluster  becomes
multiply  used.   If  one  of the two files is then deleted, the cluster
becomes free again.

     During the interval that the cluster is multiply used,  it's  quite
easy for one or both of the files to have its data corrupted.  Data from
the first file can overwrite data from the second file.  Likewise,  data
from the second file can overwrite data from the first file.  It's quite
possible that a given cluster contains  a  mixture  of  data  from  both
files.

     To be on the safe side, you  should  assume  that  both  files  are
corrupt.   Both  files  should  be  restored  from BACKUP tapes.  Before
restoring the files, however, you must  delete  the  corrupted  versions
using  the /S switch to DELFIL.  Failure to do this will result in a BAZ
stopcode.



4.0  CHECKSUMMING

     There is a checksum associated which each retrieval pointer in each
file  on  the  structure.  The purpose of this checksum is to detect the
condition outlined above (when data from one file  overwrites  the  data
from  another  file).  KLEPTO will test each of these checksums and type
out a warning message if there are any discrepancies.

     Be aware, however, that the checksum only includes the  first  word
of  the  first block of the first cluster in a retrieval pointer.  It is
therefore quite possible for the file to be corrupt even if there is  no
checksum error.  For safety sake, you should assume that any file with a
free or multiply used cluster is corrupt.



5.0  BAD BLOCKS

     When reading or writing  a  file,  if  the  monitor  encounters  an
unrecoverable  disk  error,  it  will  store the block number in the DDB
(location DEVELB).  Later, when the file is closed, DEVELB is copied  to
the  RIB (location RIBELB), and also to the BAT block.  Much later, when
the file is deleted, the monitor is careful to inspect  location  RIBELB
and  not  return  that cluster to the free pool.  Thus the bad spot will
not be reallocated to another file.

     Note that at this point in time the cluster is technically  "lost".
KLEPTO will not, however, include this cluster in its lost list.  KLEPTO
will not include any cluster which is listed in the BAT block.

     Note that the cluster will continue to be "technically lost"  until
the  next  time  the  structure  is refreshed.  The cluster will then be
inserted into BADBLK.SYS.



5.1  Bad Free Blocks

     If KLEPTO encounters a cluster which is listed in BAT but for which
the  SAT bit is zero (i.e.  the cluster is not contained in BADBLK.SYS),
then KLEPTO will list the cluster as being "free".   If  these  are  the
only  free  blocks,  KLEPTO  will print the message "All free blocks are
bad".  Pass two will be omitted.

     The existence of a bad free block is not a serious error.   Do  not
be  alarmed.  It means simply that RIBELB overflowed.  There's only room
in the RIB to list one bad spot.  If a file has two bad spots, then  one
will have to be dropped.  Both are listed in the BAT block, but only one
is recorded in the RIB.  When the file is deleted, the bad spot which is
not  recorded  in  RIBELB  will  become a bad free cluster.  KLEPTO will
light the cooresponding SAT bit, which is what the  monitor  would  have
done if there had been more room in the RIB.



6.0  WHEN TO RUN KLEPTO

     The primary reason for running KLEPTO is to  recover  lost  blocks.
The  number  of  lost  blocks  you  can  expect  to  recover  is roughly
proportionate to the  number  of  crashes  you've  experienced.   During
periods of frequent crashes you should run KLEPTO often.  During periods
of high stability it will not be necessary to run KLEPTO at all.

     There is no precise method for anticipating what the number of lost
blocks    will    be.    Contrary   to   common   belief,   DSKLST   and
DIRECT/ALLOCATE/SUM are not good indicators of actual file usage.   They
do  not include the files which are currently open.  Moreover, they take
substantially longer to run than KLEPTO itself.



7.0  DISMOUNT

     To prevent anybody from altering any file, KLEPTO will dismount the
structure during processing (single access is not good enough).  All I/O
will be performed as super I/O.

     When processing is complete, KLEPTO will automatically re-mount the
structure.   If  the  structure was initially in the system search list,
then KLEPTO will re-insert it.  If need be, KLEPTO will  also  re-insert
the  structure  into  the active swapping list and the system dump list.
KLEPTO will make no attempt, however, to restore  anybody's  job  search
list.



8.0  ALGORITHM

     In an internal data base, KLEPTO maintains a linked list of  nodes.
There's one node for each block on the disk that KLEPTO intends to read.
The list is sorted in ascending order by block number.  There are  three
types of nodes:  RIB's, directory data blocks, and checksum blocks.

     To prevent head trashing, KLEPTO processes the  list  in  order  by
cylinder.   I.E.   KLEPTO will process all the nodes on a given cylinder
before moving the heads to a new cylinder.  Within a given cylinder, the
nodes  are  not necessarily processed in order.  The rotational position
of the disk will determine which block can be pulled into  core  in  the
least amount of time.

     Not all transfers are one block long.   If  KLEPTO  needs  to  read
several  consecutive  blocks,  it will do so in a single transfer.  E.G.
The entire data portion of a UFD is usually read in a  single  transfer.
The  RIB  of a file and the checksum block are normally read in a single
transfer.

     All transfers are performed in non-blocking mode.  KLEPTO does  the
processing  for  one block while the transfer is in progress for another
block.

     Upon reading a checksum block, KLEPTO merely  computes  the  folded
checksum  and  compares this with the anticipated value.  Upon reading a
directory block, KLEPTO inserts a new node in the list for each entry in
the  directory.   Upon  reading a RIB, the action taken by KLEPTO varies
greatly depending on whether or not the file  is  a  directory.   For  a
non-directory  RIB,  KLEPTO  inserts  a  new  node  in the list for each
retrieval pointer (i.e.  each checksum block).   For  a  directory  RIB,
KLEPTO  inserts  a  new  node  in  the  list  for each data block in the
directory.

     The algorithms used by KLEPTO are rather  core  intensive.   For  a
complex  structure,  KLEPTO  will need to store a large number of nodes.
But there are few structures that will require more than 200P.  Under no
cirumstance,  however,  will KLEPTO allow itself to go virtual.  If core
gets tight, KLEPTO will alter its scheduling algorithm  to  insure  that
some core is returned.



9.0  PASS TWO

     If the structure has any free clusters or multiply  used  clusters,
then  KLEPTO  will perform a second a pass.  The purpose of this pass is
soley to print diagnostic messages.  KLEPTO will  list  each  file  that
references  the  cluster.  KLEPTO cannot do this on the first pass as it
does not yet know which clusters are multiply used and/or free.

     Note that checksums are not computed on pass two.



10.0  ALL STRUCTURES

     To process all the structures on the system, type:

.AS ALL STR
.AS dsky LPT
.R KLEPTO

Where dsky is the name of the structure  to  place  the  log  files  on.
KLEPTO  will  create a seperate log file for each structure it processes
(e.g.  the log file for structure DSKX will be DSKY:DSKX.LST).

     Note that KLEPTO cannot place DSKY.LST directly on DSKY:   as  DSKY
is not mounted at the time.  KLEPTO will create a TMP file on some other
structure.  When the processing of DSKY is completed, KLEPTO  will  copy
the log file to its proper place.



11.0  MULTIPLE KLEPTOS

     When processing all the structures on the system, it often helps if
you run multiple copies KLEPTO.  If you run two copies, for example, the
job will get done in roughly half the time.  Don't run too many  copies,
however,  as this can do more harm than good.  The optimum number varies
from system to system.  We suggest that you try a  few  experiments  and
see  what works best on your own configuration.  As a starting point, we
suggest you try CPUN+1 (one plus the number of CPU's).

     Note that the various copies of KLEPTO communicate with each  other
via  a  shared  HISEG.   By  mutual  consent, they agree which copy will
process which structure and in what order.  The scheduling algorithm  is
very  complex  and  considers  many  factors.   The  prime  goal  is, if
possible, to give each copy of KLEPTO a dedicated disk channel.



12.0  CPU SPECIFICATION

     Do not use the "SET CPU" command  when  running  KLEPTO.   Although
there  are a few cases where the command would help, they are not at all
obvious.  In fact, they are extremely counter intuitive.

     KLEPTO knows exactly what these cases are  and  will  set  the  CPU
specification if necessary.  Don't interfere by using the SET command.



13.0  HPQ

     Don't use the "SET HPQ" command when running KLEPTO.  KLEPTO  won't
run  substantially  faster as a result.  Moreover, the "SET HPQ" command
will have an adverse affect on the other jobs that might be running.



14.0  OCTAL VERSUS DECIMAL

     Some of the numbers KLEPTO types are octal and  some  are  decimal.
The  rule  is  as  follows:  Counters are always decimal, and everything
else is octal.  Thus block numbers  and  cluster  addresses  are  always
octal.  But the count of lost clusters, for example, is decimal.



15.0  7.01A VERSUS 7.02

     KLEPTO was designed to run under version 7.02 of the  monitor.   It
will,  however,  run  under  7.01A but it will do so at slightly reduced
speed.



16.0  KNOWN BUGS AND DEFICIENCIES.

     There are no known bugs in KLEPTO itself.   KLEPTO  does,  however,
exercize a bug in version 4(1150) of QUASAR.

     When KLEPTO dismounts a file structure,  QUASAR  correctly  notices
this  fact.   When  KLEPTO  re-mounts  the structure, however, QUASAR is
oblivious.  The result is that GALAXY refuses  to  touch  any  structure
that  has  been  processed by KLEPTO.  GALAXY insists that the structure
does not exist.  PCO  10-702-62  will  correct  this  problem.   It  is,
however,  a difficult PCO to install.  As a workaround, you can dismount
the structure with OMOUNT and issue a recognize command to OPR.