Trailing-Edge
-
PDP-10 Archives
-
bb-bt99g-bb
-
t1s702.d08
There is 1 other file named t1s702.d08 in the archive. Click here to see a list.
EDIT DESCRIPTIONS FOR TOPS-10-KS-V702
EDIT 11063 FOR 702
[SYMPTOM]
Wem stopcode.
[DIAGNOSIS]
Incorrect looping in module NETDEV.MAC, routine VTMDEQ when
dequeing an LDB from the VTMQUE.
[CURE]
Move label VTMDE1 up one line. See related filcom. This supersedes
PCO 10-NETSER-085.
********************************************************************************
EDIT 11065 FOR 702
[SYMPTOM]
If a user connected to a local RSX20F terminal line sets host and
sets TTY FORM, it does not take effect.
[DIAGNOSIS]
The status message is not inspected for the form feed bit.
[CURE]
Make NETDEV look at the bit. See related filcom.
********************************************************************************
EDIT 11073 FOR 702
[SYMPTOM]
Bad code in PAGOUT, resulting in the JOBINT interrupt block
being paged out.
[DIAGNOSIS]
Comparing pages and addresses.
[CURE]
Compare pages against pages and check the end of the block
as well as the beginning.
********************************************************************************
EDIT 11083 FOR 702
[SYMPTOM]
Various stopcodes during swapping. The most likely candidates
are resource confusion related stopcodes, IMEs, or UILs.
[DIAGNOSIS]
MCO 11083 as it appears in 7.02 is incomplete and can return to
scan the swap-out queues without owning the schedular interlock.
[CURE]
Hold onto the schedular interlock until after we're sure we won't
have to scan the queues again.
********************************************************************************
EDIT 11085 FOR 702
[SYMPTOM]
Problems with RP20s: 1). A servo problem on one
drive can cause other drives to drop off line, 2). Running
diagnostics on one drive (from the maintenance panel) can
cause trouble for the other drives (KCP stopcode, drive off
line, spurious I/O error).
[DIAGNOSIS]
The device driver can't issue a new command if the
controller is still busy from the previous command (i.e.
the GO bit is still up). The above two conditions can cause
the controller to remain busy for longer than anticipated.
The monitor times out and decides to issue the next command.
This hangs the entire subsystem.
[CURE]
Exit from the device driver without issuing the new
command. Wait for the previous command to signal an
interrupt, then tell FILIO that both commands are done (this
is a lie). If the new command is a transfer, then lie to
FILIO and tell him "position done". FILIO will think he's
recovering from a hung transfer and will re-issue the
transfer command. If the new command is a position, then
tell filio "recal done". FILIO will think he's recovering
from a hung position and will re-issue the position command.
Until this patch is installed, you should not attempt
to run diagnostics from the maintenance panel during
timesharing. This patch does not affect user mode
diagnostics (i.e. those run via DIAMON).
********************************************************************************
EDIT 11090 FOR 702
[SYMPTOM]
PULSAR gets undeserved hard data errors after positioning
magtapes. Labels on tapes could go undetected, allowing tapes to be
mounted as unlabeled.
[DIAGNOSIS]
Backspace operations on a TM02/TM03 return correctable data/CRC
error and frame count errors when BOT is reached and the density is
set to 800 BPI.
[CURE]
Ignore these errors as they're not meaningful since the
controller already told us we sucessfully reached BOT.
********************************************************************************
EDIT 11090 FOR 702
[SYMPTOM]
PULSAR gets undeserved hard data errors after positioning
magtapes. Labels on tapes could go undetected, allowing tapes to be
mounted as unlabeled.
[DIAGNOSIS]
Backspace operations on a TM02/TM03 return correctable data/CRC
error and frame count errors when BOT is reached and the density is
set to 800 BPI.
[CURE]
Ignore these errors as they're not meaningful since the
controller already told us we sucessfully reached BOT.
********************************************************************************
EDIT 11093 FOR 702
[SYMPTOM]
Stopcode IBZ.
[DIAGNOSIS]
Doing an OPEN for a channel based on a channel (as a result of
UDXDDB) re-uses the old JDA bits on the new (copied) DDB.
[CURE]
TLZ F,-1
********************************************************************************
EDIT 11094 FOR 702
[SYMPTOM]
Too easy for a naive user to assign all the free TTY DDBs and
cause "?Job capacity exceeded" messages.
[DIAGNOSIS]
A simple double control-C on an assigned or initialized terminal
leaves the job with a detached DDB hanging around, which is often only
accessible via DEASTY (and thus not useable by the program). The
current method of determining when to kill a TTY DDB makes the
following checks:
1) If TTYATC or ASSPRG is lit, don't kill it, it's still
accessible.
2) If ASSCON is lit, don't kill it, it's still accessible.
3) Otherwise, kill it off.
The problem is that point 2 is not always valid.
[CURE]
Make the checks more robust as follows:
1) If TTYATC or ASSPRG is lit, don't kill it, it's still
accessible.
2) If ASSCON is clear, kill it, it's useless.
3) If DEVNAM is still valid (i.e., not clobbered by TTYDT1),
keep it around, it's still accessible.
4) If DEVLOG is non-zero, it's still accessible via the logical
name, so keep it.
5) Otherwise, kill it off.
********************************************************************************
EDIT 11096 FOR 702
[SYMPTOM]
IME during FILOP function .FORRC (rewrite RIB if
changed). This problem is frequently exercised by RMS.
[DIAGNOSIS]
Junk in LH of U looks like section number.
[CURE]
Don't put junk in LH of U.
********************************************************************************
EDIT 11097 FOR 702
[SYMPTOM]
Disk quotas messed up.
[DIAGNOSIS]
The monitor does exactly what it's told. The problem
is, the monitor isn't defensive enough. It's too easy for
a program to tell the monitor to do the wrong thing.
Scenario: Since logging in, job X has created several
very large files. RIBUSD therefore differs from UFBTAL by a
large amount. RIBUSD is the word in the RIB of the UFD that
keeps track of how many blocks are used. UFBTAL is a "copy"
of RIBUSD which is kept in core. RIBUSD is not updated
while the job is logged in. Only UFBTAL is accurate. Job Y
then runs a privileged program to change, for example, the
spool name on job X's UFD. To do this, the program does a
LOOKUP of the UFD, alters RIBSPL, and then does a RENAME.
Now the LOOKUP returned a value of RIBUSD which was known to
be inaccurate. This value was passed, in turn, to the
RENAME UUO. Being a privileged program, the RENAME is
allowed to set the quota to anything it wants. Thus UFBTAL
is set to RIBUSD (which is known to be inaccurate). The
correct procedure is, of course, to set RIBUSD to -1 before
doing the RENAME UUO. This indicates that you don't want to
change the value. You'd be surprised at how many programs
don't set RIBUSD to -1. DIP, BACKUP, and LOGIN, for
example.
[CURE]
The LOOKUP UUO shouldn't return the value of RIBUSD as
found in the RIB. Get the value from UFBTAL instead. That
way it has a chance of working if a program forgets the
SETOM.
********************************************************************************
EDIT 11103 FOR 702
[SYMPTOM]
MERGE of an execute only file does not return an error code.
[DIAGNOSIS]
Since this is a SAVE/GET, FILSER does not supply an error
code, COMCON makes the decision.
[CURE]
Supply an error code of PRTERR (2) when rejecting a merge
because the file is execute only.
********************************************************************************
EDIT 11104 FOR 702
[SYMPTOM]
%LDSWP has wrong value stored in it at ONCE-only time. One
symptom of this is the failure of OMOUNT to run.
[DIAGNOSIS]
The code wants to move the right half-word into the left
half-word if there are no units in the ASL, and checks for this by
seeing if the entire word is greater than the maximum number of units
possible in the ASL (if a unit in the ASL was found, its address would
have already been put in the left half). This doesn't work in 7.02
when the UDB addresses (the contents of the left half) have the sign
bit on because they are greater than 400000.
[CURE]
Use TLNN T1,-1 instead of CAIG T1,7.
********************************************************************************
EDIT 11105 FOR 702
[SYMPTOM]
It is possible to add a unit to the ASL even if there is no pack
on the drive, if the unit previously on the drive had swapping space.
[DIAGNOSIS]
UNIK4S was never zeroed at dismount time.
[CURE]
Zero UNIK4S at dismount time.
********************************************************************************
EDIT 11106 FOR 702
[SYMPTOM]
IME trying to refresh a structure.
[DIAGNOSIS]
If the RIB for FE.SYS is bad, the location containing the RIB
block number on disk is never cleared to indicate there is no usable
RIB for FE.SYS.
[CURE]
Clear FERBD when a bad RIB is detected.
********************************************************************************
EDIT 11113 FOR 702
[SYMPTOM]
Stopcodes DOM, CMU.
[DIAGNOSIS]
IPCSER occasionally can return the MM resource when it doesn't
own it. This can only happen on a page mode send when the page to be
sent has already been paged out of the sender's working set.
[CURE]
Rearrange the code so that MM is obtained and released at the
proper places.
********************************************************************************
EDIT 11115 FOR 702
[SYMPTOM]
MPX-controlled PIM mode I/O computes wrong byte counts.
[DIAGNOSIS]
A register is getting overwritten with junk.
[CURE]
Add a "LDB T3,[POINT 6,P1,11]" instruction at STOCN2+7L.
********************************************************************************
EDIT 11117 FOR 702
[SYMPTOM]
Stopcode KAF using UU.SOE on input.
[DIAGNOSIS]
If IODEND but not IOEND gets set, UUOCON gets confused and keeps
trying to call the driver in one place and refusing to call it in
another.
[CURE]
Make having IODEND being set give the error return to the user if
UU.SOE is set on an input.
********************************************************************************
EDIT 11118 FOR 702
[SYMPTOM]
FILOP. update RIB function returns IOIMPM for non-disk devices.
[DIAGNOSIS]
Function .FOURB should be a no-op for non-disk devices.
[CURE]
Mark the function as requiring a disk in the FILOP dispatch
table.
********************************************************************************
EDIT 11119 FOR 702
[SYMPTOM]
Magtape diagnostics hang in EW doing a DIAG. UUO to assign a tape
drive.
[DIAGNOSIS]
Call to KONWAT never returns.
[CURE]
Add a PUSHJ P,SETACS.
********************************************************************************
EDIT 11120 FOR 702
[SYMPTOM]
Occasional stopcode IMEs while swapping and running the class
schedular.
[DIAGNOSIS]
The multi-threaded swapper calls SQTEST too many times.
[CURE]
At SCNJOB, go to CHKXPN which doesn't call SQTEST instead of
ZCKXPN, which does.
********************************************************************************
EDIT 11121 FOR 702
[SYMPTOM]
Virtual program has large working set even though the number of pages
it is using is small.
[DIAGNOSIS]
The PFH was pulled in during the GET of the program, and had already
initialized by the time the program did the RESET UUO. This RESET
cleared the virtual timer traps, thus preventing PFH from doing any
garbage collection.
[CURE]
Move the clearing virtual timer code from RESET into RMVPFH.
********************************************************************************
EDIT 11122 FOR 702
[SYMPTOM]
Stopcode PAO.
[DIAGNOSIS]
Incorrectly, the VM counters do not get updated for a paged out
IPCF page.
[CURE]
Update the counters regardless of whether the page is paged out
or not.
********************************************************************************
EDIT 11125 FOR 702
[SYMPTOM]
FILOP. UUO function .FOUSO fails when it should not.
[DIAGNOSIS]
If a file is opened in update mode (.FOSAU) and is read until EOF
is encountered, and a FILOP. USETO done to block 1, the FILOP. takes
the error return because of the EOF condition. This is inconsistent
with a USETO which does not check EOF.
[CURE]
Do not check IODEND.
********************************************************************************
EDIT 11126 FOR CPNSER
[SYMPTOM]
New: Add support for the "keep me" bit implemented by the MCA25.
This currently turns on "keep me" for the resident monitor. At a
later time, we may want to do some extensive performance analysis to
see if that's too much or too little.
[DIAGNOSIS]
.B 2 [C]
.B 2
********************************************************************************
EDIT 11132 FOR 702
[SYMPTOM]
Stocode KSW. Issuing a CLOSE (or RELEASe) to a write-locked
tape with unwritten buffers is required for the bug to be
observed.
[DIAGNOSIS]
The code which handles a write-lock condition on CLOSE
wanders through the scheduling code, then the regular
interrupt level code deselects the drive that was previously
selected.
[CURE]
Deselect the drive during the handling of the write-lock so
that the tape schedule code will not be invoked at that
time. We will schedule something later.
********************************************************************************
EDIT 11135 FOR 702
[SYMPTOM]
Pathological name limits cannot be changed at mongen
time.
[DIAGNOSIS]
Values were hard coded in.
[CURE]
Allow the following words to be defined at MONGEN time.
LNMMXL Maximum length of any pathological name. Defaults to
144, maximum possible value is 177.
LNMMAX Maximum number of pathological names. Defaults to 77.
********************************************************************************
EDIT 11136 FOR 702
[SYMPTOM]
Monitor is too slow.
[DIAGNOSIS]
We go out of our way to avoid checks.
[CURE]
Simply do not do the checks.
********************************************************************************
EDIT 11137 FOR 702
[SYMPTOM]
%CVSNM is wrong
[DIAGNOSIS]
PC stored in wrong place
[CURE]
Store PC in .CPSNM when we store %SYSPC.
********************************************************************************
EDIT 11141 FOR 702
[SYMPTOM]
Cannot simulate SPY UUO with .PAGSP function of PAGE. using
negative count argument.
[DIAGNOSIS]
Since we want to start mapping in from the monitor's page
zero, the left half of the argument word (source page) is
zero. NXTPAG checks for a non-zero left half to decide
whether to increment both halves or only the right half.
[CURE]
Make SPYPGS call a separate routine which knows that it has
to increment both halves.
********************************************************************************
EDIT 11145 FOR 702
[SYMPTOM]
.STPGM for a program that isn't on SYS: causes the job to loop
forever.
[DIAGNOSIS]
No check, each time the run fails, it invokes another run.
[CURE]
Check. If a run that was caused by .STPGM fails, set the program
to run to LOGOUT, so that the job dies cleanly.
********************************************************************************
EDIT 11147 FOR 702
[SYMPTOM]
Random stocodes and resource wait hangs in the presence of heavy
HPQ activity.
[DIAGNOSIS]
USCHD1 expects T1 to contain JBTSTS(J), but it doesn't
necessarily contain that if we call it from UMPRET in COMMON.
[CURE]
Change the route where T1 may not contain JBTSTS(J) to contain
JBTSTS(J) on exit.
********************************************************************************
EDIT 11151 FOR 702
[SYMPTOM]
Monitor too big.
[DIAGNOSIS]
We clear P3 for privilege checking in TRMOP. UUOs when we only
need to do so once.
[CURE]
Yes.
********************************************************************************
EDIT 11152 FOR 702
[SYMPTOM]
If all CPUs get a KAF stopcode at about the same time, they will
all loop, and the system must be reloaded by hand.
[DIAGNOSIS]
The non-policy CPUs just went on ahead and jumped into their ACs,
but they never told the policy CPU that they died. Thus, the policy
CPU is waiting for a role switch to occur, so that it can die as a
slave CPU as well.
[CURE]
Have the slave CPUs tell the policy CPU that they died, and have
the loop for the policy CPU in CPUSTP in ERRCON check occasionally to
see if it is now the last CPU and should do the reload. Have MONBTS
check this condition, so that we don't get two dumps for the same
crash on the same CPU.
********************************************************************************
EDIT 11156 FOR 702
[SYMPTOM]
Lost clusters, free clusters, stopcode BAZ.
[DIAGNOSIS]
The user does a RENAME (or updating ENTER) and
specifies a value of .RBALC which will truncate the file.
If .RBALC points somewhere within the last retrieval pointer
of a RIB, then we end up deallocating the wrong cluster.
SCNPTR leaves DEVLFT one too low (the count of blocks
left in the group). The intention is to leave one block in
reserve for the spare RIB. This throws the deallocation
off.
[CURE]
Compensate by making DEVBLK one too high.
********************************************************************************
EDIT 11161 FOR 702
[SYMPTOM]
Terminals with TERMINAL STOP and a non-zero PAGE length will
go into pause as though the monitor page limit had run out,
even though it has gone into terminal input wait less than
the page limit number of lines ago.
[DIAGNOSIS]
The monitor will only clear the page count if there is no
output to be done before it goes into input wait. Often it
will go into input wait in the code while there are still
characters to be output to the terminal.
[CURE]
At TWAITL, have the terminal wait until all characters have
been sent, then let it go into input wait.
********************************************************************************
EDIT 11162 FOR 702
[SYMPTOM]
The .TOHPS TRMOP. function does not work for DNxx terminal
lines.
[DIAGNOSIS]
The routine TOPHPS in SCNSER sets its internal horizontal
position counter, but does not tell the front end about the
change.
[CURE]
After we set the new position internally, call SETCHP before
returning from TOPHPS to tell the front end the new
horizontal position of the cursor.
********************************************************************************
EDIT 11218 FOR 702
[Symptom]
KAF while formating a disk with DDRPI.
[Diagnosis]
Super I/O causes the disk cache to be swept. The
number of blocks to sweep is computed by dividing DEVDMP by
200. This is not correct as the block size isn't
necessarily 200 words (headers and trailers are included
during formatting).
[Cure]
Don't sweep cache if the pack isn't mounted as a
structure. Don't sweep for negative block numbers.
********************************************************************************
EDIT 11234 FOR 702
[SYMPTOM]
Stopcode IME in virtual programs while running microcode version
336 with the MCA25 installed.
[DIAGNOSIS]
The "V" bit in the page fail word returned by the MAP instruction
no longer means the same thing as it used to. FLTCHK decides that if
the "V" bit is not on, then the address will not fault; this is not
always the case in a virtual program. This bit is now the "Keep-me"
bit and will never be on for version 336 of the microcode with the
MCA25 installed, as the microcode does not support the new features of
the MCA25.
[CURE]
Don't check the "V" bit in determining of the reference will
fault.
********************************************************************************
END OF TOPS-10-KS-V702