PDP-10 Archive: t1s702.d08 from bb-bt99g-bb

Trailing-Edge - PDP-10 Archives - bb-bt99g-bb - t1s702.d08
There is 1 other file named t1s702.d08 in the archive. Click here to see a list.
                 EDIT DESCRIPTIONS FOR TOPS-10-KS-V702                          
  
  
                             EDIT 11063  FOR 702
  
[SYMPTOM]
  
     Wem stopcode.
  
  
[DIAGNOSIS]
  
     Incorrect looping  in  module  NETDEV.MAC,  routine  VTMDEQ  when
dequeing an LDB from the VTMQUE.
  
  
[CURE]
  
     Move label VTMDE1 up one line. See related filcom. This supersedes
PCO 10-NETSER-085.
********************************************************************************
  
  
                             EDIT 11065  FOR 702
  
[SYMPTOM]
  
     If a user connected to a local RSX20F terminal line sets host and
sets TTY FORM, it does not take effect.
  
[DIAGNOSIS]
  
     The status message is not inspected for the form feed bit.
  
[CURE]
  
     Make NETDEV look at the bit.  See related filcom.
********************************************************************************
  
  
                             EDIT 11073  FOR 702
  
[SYMPTOM]
  
Bad code in PAGOUT, resulting in the JOBINT interrupt  block
being paged out.
  
  
[DIAGNOSIS]
  
Comparing pages and addresses.
  
  
[CURE]
  
Compare pages against pages and check the end of  the  block
as well as the beginning.
********************************************************************************
  
  
                             EDIT 11083  FOR 702
  
[SYMPTOM]
  
     Various stopcodes during swapping.  The  most  likely  candidates
are resource confusion related stopcodes, IMEs, or UILs.
  
  
[DIAGNOSIS]
  
     MCO 11083 as it appears in 7.02 is incomplete and can  return  to
scan the swap-out queues without owning the schedular interlock.
  
  
[CURE]
  
     Hold onto the schedular interlock until after we're sure we won't
have to scan the queues again.
********************************************************************************
  
  
                             EDIT 11085  FOR 702
  
[SYMPTOM]
  
     Problems with RP20s:   1).   A  servo  problem  on  one
drive  can cause other drives to drop off line, 2).  Running
diagnostics on one drive (from the  maintenance  panel)  can
cause  trouble for the other drives (KCP stopcode, drive off
line, spurious I/O error).
  
[DIAGNOSIS]
  
     The device driver can't issue  a  new  command  if  the
controller  is  still  busy  from the previous command (i.e.
the GO bit is still up).  The above two conditions can cause
the  controller  to remain busy for longer than anticipated.
The monitor times out and decides to issue the next command.
This hangs the entire subsystem.
  
[CURE]
  
     Exit from the device driver  without  issuing  the  new
command.   Wait  for  the  previous  command  to  signal  an
interrupt, then tell FILIO that both commands are done (this
is  a  lie).   If the new command is a transfer, then lie to
FILIO and tell him "position done".  FILIO will  think  he's
recovering  from  a  hung  transfer  and  will  re-issue the
transfer command.  If the new command is  a  position,  then
tell  filio  "recal done".  FILIO will think he's recovering
from a hung position and will re-issue the position command.
  
     Until this patch is installed, you should  not  attempt
to   run  diagnostics  from  the  maintenance  panel  during
timesharing.   This  patch  does  not   affect   user   mode
diagnostics (i.e.  those run via DIAMON).
********************************************************************************
  
  
                             EDIT 11090  FOR 702
  
[SYMPTOM]
  
     PULSAR  gets  undeserved  hard  data  errors  after   positioning
magtapes.   Labels  on tapes could go undetected, allowing tapes to be
mounted as unlabeled.
  
  
[DIAGNOSIS]
  
     Backspace operations on a TM02/TM03 return  correctable  data/CRC
error  and  frame  count errors when BOT is reached and the density is
set to 800 BPI.
  
  
[CURE]
  
     Ignore  these  errors  as  they're  not  meaningful   since   the
controller already told us we sucessfully reached BOT.
********************************************************************************
  
  
                             EDIT 11090  FOR 702
  
[SYMPTOM]
 
     PULSAR  gets  undeserved  hard  data  errors  after   positioning
magtapes.   Labels  on tapes could go undetected, allowing tapes to be
mounted as unlabeled.
 
 
[DIAGNOSIS]
 
     Backspace operations on a TM02/TM03 return  correctable  data/CRC
error  and  frame  count errors when BOT is reached and the density is
set to 800 BPI.
 
 
[CURE]
 
     Ignore  these  errors  as  they're  not  meaningful   since   the
controller already told us we sucessfully reached BOT.
********************************************************************************
  
  
                             EDIT 11093  FOR 702
  
[SYMPTOM]
  
     Stopcode IBZ.
  
  
[DIAGNOSIS]
  
     Doing an OPEN for a channel based on a channel (as  a  result  of
UDXDDB) re-uses the old JDA bits on the new (copied) DDB.
  
  
[CURE]
  
     TLZ F,-1
********************************************************************************
  
  
                             EDIT 11094  FOR 702
  
[SYMPTOM]
  
     Too easy for a naive user to assign all the  free  TTY  DDBs  and
cause "?Job capacity exceeded" messages.
  
  
[DIAGNOSIS]
  
     A simple double control-C  on an assigned or initialized terminal
leaves the job with a detached DDB hanging around, which is often only
accessible via DEASTY (and thus not  useable  by  the  program).   The
current  method  of  determining  when  to  kill  a  TTY DDB makes the
following checks:
  
     1)  If TTYATC or  ASSPRG  is  lit,  don't  kill  it,  it's  still
         accessible.
  
     2)  If ASSCON is lit, don't kill it, it's still accessible.
  
     3)  Otherwise, kill it off.
  
The problem is that point 2 is not always valid.
  
  
[CURE]
  
     Make the checks more robust as follows:
  
     1)  If TTYATC or  ASSPRG  is  lit,  don't  kill  it,  it's  still
         accessible.
  
     2)  If ASSCON is clear, kill it, it's useless.
  
     3)  If DEVNAM is still valid (i.e.,  not  clobbered  by  TTYDT1),
         keep it around, it's still accessible.
  
     4)  If DEVLOG is non-zero, it's still accessible via the  logical
         name, so keep it.
  
     5)  Otherwise, kill it off.
********************************************************************************
  
  
                             EDIT 11096  FOR 702
  
[SYMPTOM]
  
     IME  during  FILOP  function  .FORRC  (rewrite  RIB  if
changed).  This problem is frequently exercised by RMS.
  
  
[DIAGNOSIS]
  
     Junk in LH of U looks like section number.
  
  
[CURE]
  
     Don't put junk in LH of U.
********************************************************************************
  
  
                             EDIT 11097  FOR 702
  
[SYMPTOM]
  
     Disk quotas messed up.
  
  
[DIAGNOSIS]
  
     The monitor does exactly what it's told.   The  problem
is,  the  monitor isn't defensive enough.  It's too easy for
a program to tell the monitor to do the wrong thing.
  
     Scenario:  Since logging in, job X has created  several
very large files.  RIBUSD therefore differs from UFBTAL by a
large amount.  RIBUSD is the word in the RIB of the UFD that
keeps track of how many blocks are used.  UFBTAL is a "copy"
of RIBUSD which is kept in  core.   RIBUSD  is  not  updated
while the job is logged in.  Only UFBTAL is accurate.  Job Y
then runs a privileged program to change, for  example,  the
spool  name  on job X's UFD.  To do this, the program does a
LOOKUP of the UFD, alters RIBSPL, and then  does  a  RENAME.
Now the LOOKUP returned a value of RIBUSD which was known to
be inaccurate.  This value  was  passed,  in  turn,  to  the
RENAME  UUO.   Being  a  privileged  program,  the RENAME is
allowed to set the quota to anything it wants.  Thus  UFBTAL
is  set  to  RIBUSD  (which is known to be inaccurate).  The
correct procedure is, of course, to set RIBUSD to -1  before
doing the RENAME UUO.  This indicates that you don't want to
change the value.  You'd be surprised at how  many  programs
don't  set  RIBUSD  to  -1.   DIP,  BACKUP,  and  LOGIN, for
example.
  
  
[CURE]
  
     The LOOKUP UUO shouldn't return the value of RIBUSD  as
found  in the RIB.  Get the value from UFBTAL instead.  That
way  it  has  a chance  of working if a program  forgets the
SETOM.
********************************************************************************
  
  
                             EDIT 11103  FOR 702
  
[SYMPTOM]
  
MERGE of an execute only file does not return an error code.
  
  
[DIAGNOSIS]
  
Since this is a SAVE/GET, FILSER does not  supply  an  error
code, COMCON makes the decision.
  
  
[CURE]
  
Supply an error code of PRTERR (2) when  rejecting  a  merge
because the file is execute only.
********************************************************************************
  
  
                             EDIT 11104  FOR 702
  
[SYMPTOM]
  
     %LDSWP has wrong value stored  in  it  at  ONCE-only  time.   One
symptom of this is the failure of OMOUNT to run.
  
  
[DIAGNOSIS]
  
     The code  wants  to  move  the  right  half-word  into  the  left
half-word if  there  are  no  units in the ASL, and checks for this by
seeing if the entire word is greater than the maximum number of  units
possible in the ASL (if a unit in the ASL was found, its address would
have already been put in the left half).  This doesn't  work  in  7.02
when  the  UDB addresses (the contents of the left half) have the sign
bit on because they are greater than 400000.
  
  
[CURE]
  
     Use TLNN T1,-1 instead of CAIG T1,7.
********************************************************************************
  
  
                             EDIT 11105  FOR 702
  
[SYMPTOM]
  
     It is possible to add a unit to the ASL even if there is no  pack
on the drive, if the unit previously on the drive had swapping space.
  
  
[DIAGNOSIS]
  
     UNIK4S was never zeroed at dismount time.
  
  
[CURE]
  
     Zero UNIK4S at dismount time.
********************************************************************************
  
  
                             EDIT 11106  FOR 702
  
[SYMPTOM]
  
     IME trying to refresh a structure.
  
  
[DIAGNOSIS]
  
     If the RIB for FE.SYS is bad, the  location  containing  the  RIB
block  number  on disk is never cleared to indicate there is no usable
RIB for FE.SYS.
  
  
[CURE]
  
     Clear FERBD when a bad RIB is detected.
********************************************************************************
  
  
                             EDIT 11113  FOR 702
  
[SYMPTOM]
  
     Stopcodes DOM, CMU.
  
  
[DIAGNOSIS]
  
     IPCSER occasionally can return the MM resource  when  it  doesn't
own  it.  This can only happen on a page mode send when the page to be
sent has already been paged out of the sender's working set.
  
  
[CURE]
  
     Rearrange the code so that MM is obtained  and  released  at  the
proper places.
********************************************************************************
  
  
                             EDIT 11115  FOR 702
  
[SYMPTOM]
  
MPX-controlled PIM mode I/O computes wrong byte counts.
  
  
[DIAGNOSIS]
  
A register is getting overwritten with junk.
  
  
[CURE]
  
Add a "LDB T3,[POINT 6,P1,11]" instruction at STOCN2+7L.
********************************************************************************
  
  
                             EDIT 11117  FOR 702
  
[SYMPTOM]
  
     Stopcode KAF using UU.SOE on input.
  
  
[DIAGNOSIS]
  
     If IODEND but not IOEND gets set, UUOCON gets confused and  keeps
trying  to  call  the  driver  in one place and refusing to call it in
another.
  
  
[CURE]
  
     Make having IODEND being set give the error return to the user if
UU.SOE is set on an input.
********************************************************************************
  
  
                             EDIT 11118  FOR 702
  
[SYMPTOM]
  
     FILOP.  update RIB function returns IOIMPM for non-disk devices.
  
  
[DIAGNOSIS]
  
     Function .FOURB should be a no-op for non-disk devices.
  
  
[CURE]
  
     Mark the function as requiring  a  disk  in  the  FILOP  dispatch
table.
********************************************************************************
  
  
                             EDIT 11119  FOR 702
  
[SYMPTOM]
  
     Magtape diagnostics hang in EW doing a DIAG. UUO to assign a tape
drive.
  
  
[DIAGNOSIS]
  
     Call to KONWAT never returns.
  
  
[CURE]
  
     Add a PUSHJ P,SETACS.
********************************************************************************
  
  
                             EDIT 11120  FOR 702
  
[SYMPTOM]
  
     Occasional stopcode IMEs while swapping  and  running  the  class
schedular.
  
  
[DIAGNOSIS]
  
     The multi-threaded swapper calls SQTEST too many times.
  
  
[CURE]
  
     At SCNJOB, go to CHKXPN which  doesn't  call  SQTEST  instead  of
ZCKXPN, which does.
********************************************************************************
  
  
                             EDIT 11121  FOR 702
  
[SYMPTOM]
  
Virtual program has large working set even though the number of pages
it is using is small.
  
[DIAGNOSIS]
  
The  PFH was pulled in during the GET of the program, and had already
initialized  by  the  time  the program did the RESET UUO. This RESET
cleared  the  virtual timer traps, thus preventing PFH from doing any
garbage collection.
  
[CURE]
  
Move the clearing virtual timer code from RESET into RMVPFH.
********************************************************************************
  
  
                             EDIT 11122  FOR 702
  
[SYMPTOM]
  
     Stopcode PAO.
  
  
[DIAGNOSIS]
  
     Incorrectly,  the  VM counters do not get updated for a paged out
IPCF page.
  
  
[CURE]
  
     Update the counters regardless of whether the page is  paged  out
or not.
********************************************************************************
  
  
                             EDIT 11125  FOR 702
  
[SYMPTOM]
  
     FILOP.  UUO function .FOUSO fails when it should not.
  
  
[DIAGNOSIS]
  
     If a file is opened in update mode (.FOSAU) and is read until EOF
is encountered, and a FILOP.  USETO done to block 1, the FILOP.  takes
the error return because of the EOF condition.  This  is  inconsistent
with a USETO which does not check EOF.
  
  
[CURE]
  
     Do not check IODEND.
********************************************************************************
  
  
                             EDIT 11126  FOR CPNSER
  
[SYMPTOM]
  
     New:  Add support for the "keep me" bit implemented by the MCA25.
This  currently  turns  on  "keep  me" for the resident monitor.  At a
later time, we may want to do some extensive performance  analysis  to
see if that's too much or too little.
  
  
[DIAGNOSIS]
  
     .B 2 [C]
  
     .B 2
********************************************************************************
  
  
                             EDIT 11132  FOR 702
  
[SYMPTOM]
  
  
Stocode KSW.  Issuing a CLOSE (or RELEASe) to a write-locked
tape  with  unwritten  buffers is required for the bug to be
observed.
  
  
[DIAGNOSIS]
  
  
The code which  handles  a  write-lock  condition  on  CLOSE
wanders  through  the  scheduling  code,  then  the  regular
interrupt level code deselects the drive that was previously
selected.
  
  
[CURE]
  
  
Deselect the drive during the handling of the write-lock  so
that  the  tape  schedule  code  will not be invoked at that
time.  We will schedule something later.
********************************************************************************
  
  
                             EDIT 11135  FOR 702
  
[SYMPTOM]
  
     Pathological name limits cannot be  changed  at  mongen
time.
  
  
[DIAGNOSIS]
  
     Values were hard coded in.
  
  
[CURE]
  
     Allow the following words to be defined at MONGEN time.
  
        LNMMXL  Maximum length of any pathological name.  Defaults to
                144, maximum possible value is 177.
  
        LNMMAX  Maximum number of pathological names.  Defaults to 77.
********************************************************************************
  
  
                             EDIT 11136  FOR 702
  
[SYMPTOM]
  
     Monitor is too slow.
  
  
[DIAGNOSIS]
  
     We go out of our way to avoid checks.
  
  
[CURE]
  
     Simply do not do the checks.
********************************************************************************
  
  
                             EDIT 11137  FOR 702
  
[SYMPTOM]
  
%CVSNM is wrong
  
[DIAGNOSIS]
  
PC stored in wrong place
  
[CURE]
  
Store PC in .CPSNM when we store %SYSPC.
********************************************************************************
  
  
                             EDIT 11141  FOR 702
  
[SYMPTOM]
  
Cannot simulate SPY UUO with .PAGSP function of PAGE.  using
negative count argument.
  
  
[DIAGNOSIS]
  
Since we want to start mapping in from  the  monitor's  page
zero,  the  left  half of the argument word (source page) is
zero.  NXTPAG checks for a  non-zero  left  half  to  decide
whether to increment both halves or only the right half.
  
  
[CURE]
  
Make SPYPGS call a separate routine which knows that it  has
to increment both halves.
********************************************************************************
  
  
                             EDIT 11145  FOR 702
  
[SYMPTOM]
  
.STPGM for a program that isn't on SYS: causes the job  to loop
forever.
  
[DIAGNOSIS]
  
No check, each time the run fails, it invokes another run.
  
[CURE]
  
Check. If a run that was caused by .STPGM fails, set the program
to run to LOGOUT, so that the job dies cleanly.
********************************************************************************
  
  
                             EDIT 11147  FOR 702
  
[SYMPTOM]
  
     Random stocodes and resource wait hangs in the presence of  heavy
HPQ activity.
  
  
[DIAGNOSIS]
  
     USCHD1  expects  T1  to  contain  JBTSTS(J),   but   it   doesn't
necessarily contain that if we call it from UMPRET in COMMON.
  
  
[CURE]
  
     Change the route where T1 may not contain  JBTSTS(J)  to  contain
JBTSTS(J) on exit.
********************************************************************************
  
  
                             EDIT 11151  FOR 702
  
[SYMPTOM]
  
     Monitor too big.
  
  
[DIAGNOSIS]
  
     We clear P3 for privilege checking in TRMOP. UUOs  when  we  only
need to do so once.
  
  
[CURE]
  
     Yes.
********************************************************************************
  
  
                             EDIT 11152  FOR 702
  
[SYMPTOM]
  
     If all CPUs get a KAF stopcode at about the same time, they  will
all loop, and the system must be reloaded by hand.
  
  
[DIAGNOSIS]
  
     The non-policy CPUs just went on ahead and jumped into their ACs,
but  they  never told the policy CPU that they died.  Thus, the policy
CPU is waiting for a role switch to occur, so that it  can  die  as  a
slave CPU as well.
  
  
[CURE]
  
     Have the slave CPUs tell the policy CPU that they died, and  have
the  loop for the policy CPU in CPUSTP in ERRCON check occasionally to
see if it is now the last CPU and should do the reload.   Have  MONBTS
check  this  condition,  so  that  we don't get two dumps for the same
crash on the same CPU.
********************************************************************************
  
  
                             EDIT 11156  FOR 702
  
[SYMPTOM]
  
     Lost clusters, free clusters, stopcode BAZ.
  
[DIAGNOSIS]
  
     The  user  does  a  RENAME  (or  updating  ENTER)   and
specifies  a  value  of .RBALC which will truncate the file.
If .RBALC points somewhere within the last retrieval pointer
of a RIB, then we end up deallocating the wrong cluster.
  
     SCNPTR leaves DEVLFT one too low (the count  of  blocks
left  in the group).  The intention is to leave one block in
reserve for the spare RIB.   This  throws  the  deallocation
off.
  
[CURE]
  
     Compensate by making DEVBLK one too high.
********************************************************************************
  
  
                             EDIT 11161  FOR 702
  
[SYMPTOM]
  
  
Terminals with TERMINAL STOP and a non-zero PAGE length will
go  into pause as though the monitor page limit had run out,
even though it has gone into terminal input wait  less  than
the page limit number of lines ago.
  
  
[DIAGNOSIS]
  
The monitor will only clear the page count if  there  is  no
output to be done before it goes into input wait.  Often  it
will go into input wait in the code  while there  are  still
characters to be output to the terminal.
  
  
[CURE]
  
At TWAITL, have the terminal wait until all characters  have
been sent, then let it go into input wait.
********************************************************************************
  
  
                             EDIT 11162  FOR 702
  
[SYMPTOM]
  
The .TOHPS TRMOP.  function does not work for DNxx  terminal
lines.
  
  
[DIAGNOSIS]
  
The routine TOPHPS in SCNSER sets  its  internal  horizontal
position  counter, but does not tell the front end about the
change.
  
  
[CURE]
  
After we set the new position internally, call SETCHP before
returning  from  TOPHPS  to  tell  the  front  end  the  new
horizontal position of the cursor.


********************************************************************************
  
  
                             EDIT 11218  FOR 702
  
[Symptom]
     KAF while formating a disk with DDRPI.

[Diagnosis]
     Super I/O causes the  disk  cache  to  be  swept.   The
number  of blocks to sweep is computed by dividing DEVDMP by
200.   This  is  not  correct  as  the  block   size   isn't
necessarily  200  words  (headers  and trailers are included
during formatting).

[Cure]
     Don't sweep cache  if  the  pack  isn't  mounted  as  a
structure.  Don't sweep for negative block numbers.


********************************************************************************
  
  
                             EDIT 11234  FOR 702
  
  
  
  
[SYMPTOM]
  
     Stopcode IME in virtual programs while running microcode  version
336 with the MCA25 installed.
  
[DIAGNOSIS]
  
     The "V" bit in the page fail word returned by the MAP instruction
no  longer means the same thing as it used to.  FLTCHK decides that if
the "V" bit is not on, then the address will not fault;  this  is  not
always  the  case in a virtual program.  This bit is now the "Keep-me"
bit and will never be on for version 336 of  the  microcode  with  the
MCA25 installed, as the microcode does not support the new features of
the MCA25.
  
[CURE]
  
     Don't check the "V" bit in  determining  of  the  reference  will
fault.
********************************************************************************
  
  
  
END OF  TOPS-10-KS-V702