Trailing-Edge
-
PDP-10 Archives
-
tops20-v7-ft-dist2-clock
-
7-documentation/tops20.tco
There are 24 other files named tops20.tco in the archive. Click here to see a list.
TCO-number: 7.1000
Written-by: MCCOLLUM Creation-date: 24-Nov-86 13:27:38
Edit-date: 5-Jan-87 14:09:29
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Problem:
This is a test TCO for release 7.0
Diagnosis:
As above.
Solution:
Write release 7.0
[End of TCO 7.1000]
TCO-number: 7.1002
Written-by: RASPUZZI Creation-date: 28-May-87 15:36:47
Edited-by: RASPUZZI Edit-date: 5-Jun-87 15:09:14
Edit-checked: Yes Document: No TCO-tested: No
Maintenance-release: No Hardware-related: Yes
Program: MONITOR
Routines-affected: APRSRV GLOB PAGEM PAGUTL PHYKLP PHYKNI
PHYP2 PHYP4 PHYSIO STG
Problem:
Some of the I/O that TOPS-20 does uses the simulation of PMOVE/PMOVEM.
These instructions now exist in the KL microcode.
Diagnosis:
It would be advantageous to use these instructions in the monitor instead
of simulating them.
Solution:
Implement PMOVE/PMOVEM instructions in several modules that reference
physical memory and watch performance get better (hopefully).
[End of TCO 7.1002]
TCO-number: 7.1003
Written-by: RASPUZZI Creation-date: 28-May-87 16:21:21
Edited-by: RASPUZZI Edit-date: 5-Jun-87 15:10:58
Edit-checked: Yes Document: Yes TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: MEXEC
Problem:
During system startup, a brain damaged operator can walk away from the CTY
and leave the "Why Reload?" and "Run CHECKD?" questions unanswered.
Diagnosis:
The monitor does a RDTTY% to obtain an answer from these questions and
the RDTTY% does not have a timeout feature.
Solution:
Redo the code around the two questions for the benefit of these neanderthal
operators so that the questions will timeout in 60 seconds and the system
will continue to boot.
[End of TCO 7.1003]
TCO-number: 7.1005
Written-by: GSCOTT Creation-date: 29-May-87 10:38:36
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: pagutl
Problem: CFZCNT BUGHLTs
Diagnosis:
Between the time that DDMPF selects an OFN for DDOCFS to process and DDOCFS
starts working on the OFN at process level, a vote can come in at interrupt
level that causes the OFN to become cached. The code at DDOCFS+12 attempts
to detect this and avoind doing anything with the cached OFN, and goes
OKSKED then goes to DDOCF1 which called CFSFOD. CFSFOD was already called
when we cached the OFN, and should not be called again after the OFN is
cached.
Solution: Don't call CFSFOD if the OFN is cached; just JRST down to DDGOD.
[End of TCO 7.1005]
TCO-number: 7.1006
Written-by: GSCOTT Creation-date: 1-Jun-87 19:03:36
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: MEXEC
Problem:
Various problems with edit 7456 to 6.1 monitor
Diagnosis:
There was one edit lost and there are three minor changes needed to MEXEC
to make it all work right.
Solution:
Fix MEXEC to report proper times on logout, not account for not-logged-in
jobs, and correct entries in LOGLST and LOGLSD tables.
[End of TCO 7.1006]
TCO-number: 7.1009
Written-by: RASPUZZI Creation-date: 3-Jun-87 10:53:15
Edited-by: RASPUZZI Edit-date: 5-Jun-87 15:11:53
Edit-checked: Yes Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: SCAPAR
Problem:
There is no way to determine how many words are in the user area
of an SCA buffer.
Diagnosis:
No one defined a mnemonic for it.
Solution:
Add C%MUDA (Maximum User Data Area) to SCAPAR so anyone who needs to
know this can use C%MUDA.
[End of TCO 7.1009]
TCO-number: 7.1010
Written-by: MCCOLLUM Creation-date: 5-Jun-87 11:25:43
Edit-checked: No Document: Yes TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: ENACT RUNDI1
Related-SPR: 21635
Problem:
ILMNRF BUGHLTs when ACCOUNTS-TABLE.BIN is an empty file.
Diagnosis:
Routine ENACT is called to map the first page of the ACCOUNTS-TABLE.BIN
into the monitors address space at location HSHPG. No check is made to
see if the file is empty. Later when this page is touched by routine
VERACT, an ILMNRF BUGHLT results if the file was, in fact, empty.
Solution:
In routine ENACT, check to see if the file is zero pages long. If it is,
don't attempt to map in the first page of the file and return an error
to the user.
Also, fix up routine RUNDI1 in MEXEC that calls ENACT to display a more
useful error message when ENACT fails.
[End of TCO 7.1010]
TCO-number: 7.1011
Written-by: LOMARTIRE Creation-date: 15-Jun-87 09:03:32
Edit-checked: Yes Document: Yes TCO-tested: Yes
Maintenance-release: No Hardware-related: No
Program: Monitor
Routines-affected: CFSJYN
Problem:
When TOPS-20 initializes, one of its many functions is to
ensure that it has "joined" the cluster completely. In this
case, "joining" means that this node has an open virtual circuit
to every other node on the CI that is answering REQUEST-IDs and
that there exist a CFS connection to all such nodes which are KLs
running TOPS-20. This "joining" check is done early in the
initialization of the system and before any shared file I/O can
take place. This process ensures that the file system remains
intact while systems leave and enter the cluster.
However, sometimes there are problems with this "joining"
process. When this occurs, the system appears hung and, until
the problem is resolved, will not initialize.
Diagnosis:
When this occurs, it would be helpful to indicate what the problem
is so that some steps can be taken to resolve it.
The blocking can be caused by two factors:
1. The system sees a node on the CI which is answering
REQUEST-IDs but a System Block has not yet been formed by the
CI Port Driver (PHYKLP). This can occur if the
START/STACK/ACK sequence is not completing.
2. The system sees another TOPS-20 node on the CI, for which a
System Block has been formed, but for which there does not
exist a CFS connection.
Solution:
When one of these problems is detected in the "joining"
process, a message will be printed on the CTY describing the
condition. This message will appear at most once a minute for
any node to which "joining" is blocked. Also, no message will be
printed for the first 5 seconds of these conditions to allow them
to resolve themselves. The messages will be of the following
format:
%CANNOT JOIN CLUSTER WITH NODE nn BECAUSE: No System Block created
- OR -
%CANNOT JOIN CLUSTER WITH NODE nn BECAUSE: No CFS connection
Note that the node number will be printed in decimal.
[End of TCO 7.1011]
TCO-number: 7.1012
Written-by: MCCOLLUM Creation-date: 16-Jun-87 15:03:36
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: CFMDSN
Problem:
CFSSUF BUGHLTs when dismounting a structure.
Diagnosis:
Problems arise when there are two disk drives with the same serial numbers
on the same system. When a structure is mounted, CFS acquires tokens for
the DSN of each disk drive in the structure. If two structures contain
disks with the same serial number, then the DSN token is released when the
first of these two structures is dismounted. Later, when MOUNTR attempts
to dismount the second structure, it tries to gain exclusive access for the
structure in the CFS cluster. Routine CFSSUG attempts to look up the DSN
token in the CFS data base and crashes with a CFSSUF BUGHLT when it is not
found.
Solution:
Having two disk drives with the same serial number is an illegal configuration
in a CFS environment, but TOPS-20 should handle the situation better. Routine
CFMDSN, which registers DSN tokens during a structure mount, will be changed
to fail when the DSN token is already in use for another disk structure. It
will also now issue a CFDDSN BUGCHK informing the system manager or operator
of this illegal configuration so Field Service can change a disk serial
number. This change also requires that routine CFSSDI be changed. CFSSDI
currently removes all CFS tokens acquired when a structure mount failed. This
routine must be changed to release only the DSN tokens that were actually
acquired during the process of the failed mount. This can be done by checking
the HSHCNT in the DSN token resource block. If the value in HSHCNT is one,
the block can be released. If it is greater than one, then HSHCNT is decremented
but the block is not released.
[End of TCO 7.1012]
TCO-number: 7.1013
Written-by: RASPUZZI Creation-date: 23-Jun-87 14:27:51
Edited-by: RASPUZZI Edit-date: 23-Jun-87 14:39:02
Edit-checked: Yes Document: No TCO-tested: Yes
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: LLMOP
Problem:
NSKDIS BUGHLTs when using LLMOP% JSYS.
Diagnosis:
Fix fat fingered mistake. Routine RCRWAI was mistakenly moved to
XRESCD when LLMOP was dropped into section 6. Scheduler tests must be
in RESCD.
Solution:
Move RCRWAI out of XRESCD into RESCD.
[End of TCO 7.1013]
TCO-number: 7.1014
Written-by: RASPUZZI Creation-date: 29-Jun-87 15:48:47
Edit-checked: No Document: Yes TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: GTJFN LOOKUP DIRECT STG GLOBS COMND
Problem:
It would be nice if TOPS-20 did partial file recognition and
partial COMND% keyword/switch recognition.
Diagnosis:
COMND, GTJFN and DIRECT do not have that functionality yet.
Solution:
Give COMND, GTJFN and DIRECT the ability to partially recognize
files and command keyword/switches.
[End of TCO 7.1014]
TCO-number: 7.1015
Written-by: LOMARTIRE Creation-date: 30-Jun-87 12:00:17
Edit-checked: Yes Document: No TCO-tested: Yes
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: DRMIO
Problem:
TCSOFN BUGHLTs occur when they are not necessary.
Diagnosis:
During the caching process, it is possible for an OFN to be swapped out before
it is completly cached. So, when it is referenced again (i.e. when it is
uncached), the OFN will have to be swapped in. The current code will TCSOFN
BUGHLT if a cached OFN is swapped in or out. It is not necessary to BUGHLT if
the OFN is swapped in.
Solution:
Only issue the TCSOFN BUGHLT on a disk write of a cached OFN.
[End of TCO 7.1015]
TCO-number: 7.1016
Written-by: RASPUZZI Creation-date: 7-Jul-87 14:14:03
Edited-by: RASPUZZI Edit-date: 7-Jul-87 14:15:06
Edit-checked: Yes Document: No TCO-tested: Yes
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: STG
Problem:
Not enough SNOOP pages to run WATCH and SYSDPY at the same time.
Diagnosis:
SNPDPC is set to a mere 12 pages.
Solution:
Increase SNPDPC to a number (like 30) so that SYSDPY and WATCH
can coexist.
[End of TCO 7.1016]
TCO-number: 7.1017
Written-by: RASPUZZI Creation-date: 7-Jul-87 14:20:56
Edited-by: RASPUZZI Edit-date: 7-Jul-87 14:21:29
Edit-checked: Yes Document: No TCO-tested: Yes
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: JNTMAN
Problem:
The NODE% JSYS always takes the first six characters when verifying
a DECnet node.
Diagnosis:
DECnet nodes only have 6 characters. However, who's to say that a
moronic user can't pass NODE% a string longer than 6 characters.
Solution:
Have the .NDVFY function return COMX19 when the user passes a string
of more than 6 characters.
[End of TCO 7.1017]
TCO-number: 7.1018
Written-by: RASPUZZI Creation-date: 9-Jul-87 09:36:26
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: STG
Problem:
Can't build monitor.
Diagnosis:
Too many SNOOP pages.
Solution:
Remove TCO 7.1016. WATCH and SYSDPY will have to fight it out.
[End of TCO 7.1018]
TCO-number: 7.1019
Written-by: RASPUZZI Creation-date: 9-Jul-87 16:11:39
Edit-checked: No Document: Yes TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: MEXEC
Problem:
The "R" in the "Why reload?" question is capitalized when the
question times out.
Diagnosis:
This is a major catastrophic inconsistency.
Solution:
Shoot engineer, then make the "R" be lowercase.
[End of TCO 7.1019]
TCO-number: 7.1020
Written-by: RASPUZZI Creation-date: 14-Jul-87 14:11:32
Edit-checked: No Document: Yes TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: JSYSA
Problem:
A system administrator can expire any user's password but the
unprivileged user can override the system administrator's work.
Diagnosis:
Half baked implementation of password expiration. Non-WHEELies
should not be able to change their password expiration.
Solution:
Only allow WHEELed (or OPERATORs) to change the .CMPMU and .CMPED
words when doing a CRDIR%.
[End of TCO 7.1020]
TCO-number: 7.1021
Written-by: LOMARTIRE Creation-date: 15-Jul-87 10:11:11
Edited-by: LOMARTIRE Edit-date: 6-Aug-87 15:34:26
Edit-checked: Yes Document: No TCO-tested: Yes
Maintenance-release: No Hardware-related: No
Program: Monitor
Routines-affected: PISC7 CLOSV4 LSNUP CFSDMP CFDLSN CFSJYN
Related-TCO: 7.1033
Problem:
There is no easy way to obtain a simultaneous dump of the entire cluster.
Diagnosis:
No code to do it.
Solution:
Add the cluster dump facility.
[End of TCO 7.1021]
TCO-number: 7.1022
Written-by: RASPUZZI Creation-date: 17-Jul-87 10:40:37
Edited-by: RASPUZZI Edit-date: 17-Jul-87 10:43:53
Edit-checked: Yes Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: MEXEC
Related-SPR: 21621
Problem:
ILMNRF crashes out of JOBCOF.
Diagnosis:
The monitor assumes that the controlling terminal will not go away
in routine LDTACH before it puts this terminal number in a STKVAR
variable. Unfortunately, there is a case where the controlling terminal
can go away and cause the monitor to pick up a bogus value out of
this STKVAR location.
Solution:
Save the controlling terminal in the STKVAR variable as soon as it has
been discovered that the controlling terminal still exists.
[End of TCO 7.1022]
TCO-number: 7.1023
Written-by: RASPUZZI Creation-date: 18-Jul-87 09:47:18
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: FUTILI
Problem:
DLUSER doesn't work.
Diagnosis:
Edit 7411 attempted to prevent ILMNRFs in the wrong place.
Solution:
Remove edit 7411 and figure out where the fix really should be.
[End of TCO 7.1023]
TCO-number: 7.1024
Written-by: MCCOLLUM Creation-date: 21-Jul-87 10:53:43
Edited-by: MCCOLLUM Edit-date: 21-Aug-87 17:08:57
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: LATSRV CTHSRV
Related-TCO: 7.1037 7.1042 7.1043
Problem:
Section 0/1 address space is full. The Increase Structure Limit project
requires section 0/1 address space.
Diagnosis:
LATSRV and CTHSRV can be made to run in section XCDSEC.
Solution:
Move the code in LATSRV.MAC and CTHSRV.MAC into section XCDSEC.
[End of TCO 7.1024]
TCO-number: 7.1026
Written-by: RASPUZZI Creation-date: 23-Jul-87 14:51:32
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: TCPJFN
Related-SPR: 21611
Problem:
Repeated IPGCOL and INTFR6 BUGINFs and IP free space has disappeared.
Diagnosis:
Doing a GTJFN% on a TCP: device causes a prototype TCB to be allocated
from IP free space. If the the user attempts to OPENF% this JFN and
the open fails, the free space is lost when the user discards the
JFN.
Solution:
Teach TCPOP5 to return IP free space when the open on the TCP connection
fails for the TCP: JFN.
[End of TCO 7.1026]
TCO-number: 7.1027
Written-by: RASPUZZI Creation-date: 28-Jul-87 14:24:53
Edited-by: RASPUZZI Edit-date: 28-Jul-87 14:26:14
Edit-checked: Yes Document: No TCO-tested: Yes
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: TCPTCP
Related-SPR: 21610
Problem:
ILMNRF or possible ILPSEC BUGHTLs.
Diagnosis:
Routine TRMPKT is not resetting an AC properly before calling
routines to free IP free space blocks.
Solution:
Have TRMPKT reset T1 to the appropriate value before getting to
cade that calls RETPKT.
[End of TCO 7.1027]
TCO-number: 7.1029
Written-by: RASPUZZI Creation-date: 29-Jul-87 08:15:51
Edited-by: LOMARTIRE Edit-date: 6-Aug-87 15:38:53
Edit-checked: Yes Document: No TCO-tested: Yes
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: CFSSRV
Problem:
CFCTNF BUGHLTs.
Diagnosis:
When CFS does garbage collection to cleanup tokens that are no longer
in use, it calls routine CFSSPC. Eventually, CFSSPC gets to routine
CFSRS0 which attempts to collect all unused CFS tokens. The problem
is this routine is cleaning up cached tokens also. Somehow, the keep
bit is cleared for this particular cached token. Once the token has
been removed and the OFN is garbage collected, the cached token will not
be found and the CFCTNF BUGHLT will result.
Solution:
Have CFSRS0 not remove any cached token. Also, ensure that the keep bit
is set everytime the token access marker (indicating a cached token) is
set. Finally, when routine CFSAWT/CFSAWP exits, check to see if the keep
bit is set. If not, then issue a CFKBNS BUGHLT.
[End of TCO 7.1029]
TCO-number: 7.1030
Written-by: RASPUZZI Creation-date: 30-Jul-87 14:23:43
Edited-by: RASPUZZI Edit-date: 30-Jul-87 14:24:41
Edit-checked: Yes Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: MSTR
Problem:
MOUNTR is getting a local job number from the MONITOR during
structure mount increments and decrements.
Diagnosis:
Yet another forgotten spot where a global job number should be
used.
Solution:
Have the MONITOR send the global job number in the IPCF packet
to MOUNTR instead of the local job number.
[End of TCO 7.1030]
TCO-number: 7.1032
Written-by: RASPUZZI Creation-date: 4-Aug-87 14:21:18
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: IO
Problem:
Device or data errors being returned through the SIN% JSYS when
everything appears to be fine with the device.
Diagnosis:
Edit 7391 checks the error code explicitly for IOX7 (JSB free space
exhausted) and if the error is not IOX7, it changes it to IOX5. This
is bad if the routine is entered with MONX02 error.
Solution:
Have the routine check for MONX02. If the error was MONX02, have it
changed to IOX7 and not IOX5.
[End of TCO 7.1032]
TCO-number: 7.1033
Written-by: LOMARTIRE Creation-date: 6-Aug-87 15:33:05
Edited-by: LOMARTIRE Edit-date: 6-Aug-87 15:34:25
Edit-checked: Yes Document: No TCO-tested: Yes
Maintenance-release: No Hardware-related: No
Program: Monitor
Routines-affected: CFSDMP
Related-TCO: 7.1021
Problem:
PITRAP BUGHLTs when doing cluster dump.
Diagnosis:
Routine CFSDMP sets the stack to be the extended dump stack and then does an
EA.ENT to enter section 1. Well, $EAENT does a HRRZS 0(P) in order to zero the
flags of the (assumed) return PC which is on the top of the stack. CFSDMP is
called from section 0 from PISC7, so the HRRZS was done to 0,,address instead
of 11,,address. This produces indeterminate results - usually BUGHLTs!
Solution:
Replace the EA.ENT with a XJRST [MSEC1,,.+1] to enter section 1.
[End of TCO 7.1033]
TCO-number: 7.1034
Written-by: RASPUZZI Creation-date: 10-Aug-87 14:05:06
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: IPCF MEXEC GLOBS
Problem:
Cluster INFO% and Dump on BUGCHK need some 0/1 space.
Diagnosis:
Section 0/1 space is a commodity.
Solution:
Move IPCF out of section 0/1. This gains about 4 pages of space.
[End of TCO 7.1034]
TCO-number: 7.1035
Written-by: GSCOTT Creation-date: 11-Aug-87 15:52:23
Edited-by: GSCOTT Edit-date: 14-Aug-87 10:09:19
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: PHYSIO
Related-SPR: 94
Problem:
PDBSTA BUGCHKs followed by OVRDTA BUGCHK when booting system with two or
more drives dual ported to the same pair of RH20s on the same system.
Diagnosis:
CHBDON gets the UDBST1 flags from the secondary UDB when a home block check
is done using the secondary path. It discovers that UDBST1 is zero, and the
result is the PDBSTA.
Solution:
Add code at PDBSTA to point P3 back to the primary UDB if P3 is pointing
to the secondary UDB.
[End of TCO 7.1035]
TCO-number: 7.1036
Written-by: LOMARTIRE Creation-date: 12-Aug-87 10:30:29
Edit-checked: Yes Document: No TCO-tested: Yes
Maintenance-release: No Hardware-related: No
Program: Monitor
Routines-affected: PAGEM PAGUTL GLOBS
Related-QAR: 21659
Problem:
The "OFN lock tracer" feature enabled via the SPTDSW switch does not work
anymore.
Diagnosis:
It was broken during 6.1 development and also additional trace points are
needed.
Solution:
Add the additional trace points and make routine SPTRAC global.
[End of TCO 7.1036]
TCO-number: 7.1037
Written-by: MCCOLLUM Creation-date: 12-Aug-87 12:45:14
Edited-by: LOMARTIRE Edit-date: 2-Sep-87 14:17:05
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: SCAMPI
Related-TCO: 7.1024 7.1042 7.1043 7.1048
Problem:
More section 0/1 address space is needed for the Increase Structure Limit
project.
Diagnosis:
SCAMPI can be made to run in section XCDSEC.
Solution:
Move the code in SCAMPI.MAC into section XCDSEC.
[End of TCO 7.1037]
TCO-number: 7.1040
Written-by: LOMARTIRE Creation-date: 19-Aug-87 12:30:04
Edit-checked: Yes Document: No TCO-tested: Yes
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: ENQ
Problem:
We still need more 0/1 space.
Diagnosis:
Most of ENQ.MAC can be moved into XCDSEC.
Solution:
Move all of ENQ.MAC into XCDSEC except for routines: .ENQ, .DEQ, .ENQC,
ENQCD, ENQTST, ENQFKR, ENQCLS, and ENQINI. Also, repaginate the listing to make
it more readable.
[End of TCO 7.1040]
TCO-number: 7.1041
Written-by: LOMARTIRE Creation-date: 19-Aug-87 12:33:50
Edit-checked: Yes Document: Yes TCO-tested: Yes
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: ENQ
Problem:
Functions 1 and 2 of ENQC% do not work correctly when a global job number
or -1 is supplied for the job number.
Diagnosis:
An IFNSK. has to be changed to an IFSKP..
Solution:
Do it. Also, there is no mention in the documentation that a -1, supplied
as the job number in the argument block, mean your own job.
[End of TCO 7.1041]
TCO-number: 7.1042
Written-by: MCCOLLUM Creation-date: 19-Aug-87 16:30:24
Edited-by: MCCOLLUM Edit-date: 21-Aug-87 17:09:02
Edit-checked: No Document: Yes TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: STG
Related-TCO: 7.1037 7.1024 7.1043
Problem:
TOPS-20 will only allow a maximum of 32 (decimal) structures to be mounted
system-wide.
Diagnosis:
There is not enough "units" free space in the section 0/1 resident free
space pool to allow for more than 32 structures.
Solution:
Create more section 0/1 free space by moving code to XCDSEC. See the
related TCOs for more details on this. Increase the "units" free space
from 17500 (octal) words to 37200 words. Reorganize the JSB to allow
the JSSTRT table (which is based on the number of structures) to
increase in size. Increase STRN from 32 to 64 (decimal).
[End of TCO 7.1042]
TCO-number: 7.1043
Written-by: MCCOLLUM Creation-date: 21-Aug-87 17:08:57
Edited-by: MCCOLLUM Edit-date: 21-Aug-87 17:10:16
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: SCSJSY
Related-TCO: 7.1024 7.1037 7.1042
Problem:
Section 0/1 address space is nearly exhausted.
Diagnosis:
Increasing the structure limit to 64 (decimal) via TCO 7.1042 used
up much section 0/1 address space.
Solution:
Reclaim the section 0/1 address space by moving the code in SCSJSY.MAC
to section XCDSEC.
[End of TCO 7.1043]
TCO-number: 7.1044
Written-by: RASPUZZI Creation-date: 27-Aug-87 14:49:45
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: JSYSF
Related-SPR: 21672
Problem:
The RFTAD% JSYS does not handle argument blocks that are not in the
local section when it is called from a different section outside
of the argument block.
Diagnosis:
RFTAD% uses a BLT to user space when initializing the argument block.
This BLT appears to be a mysterious NOP.
Solution:
Make RFTAD% call routine BLTUU to initialize the user's argument block.
[End of TCO 7.1044]
TCO-number: 7.1045
Written-by: RASPUZZI Creation-date: 28-Aug-87 09:08:45
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: IPCF
Problem:
ILMNRFs when attempting to delete a directory with archived files.
Diagnosis:
Slight oversight when moving IPCF into section 6. Mainly, CPTAB is
in GTJFN and is not an immediate value.
Solution:
Make instruction that loads the byte from CPTAB run in section 1.
[End of TCO 7.1045]
TCO-number: 7.1046
Written-by: RASPUZZI Creation-date: 28-Aug-87 09:11:59
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: JSYSA
Problem:
NTINF% does not work with terminal line 0.
Diagnosis:
Fence post error that was accidently introduced with edit 7410.
Solution:
Make sure NTRRH1 checks for a 0 line too.
[End of TCO 7.1046]
TCO-number: 7.1047
Written-by: RASPUZZI Creation-date: 1-Sep-87 14:47:18
Edited-by: RASPUZZI Edit-date: 1-Sep-87 14:49:08
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: COMND
Problem:
Cannot define logical names with _^Vs in them.
Diagnosis:
Semi brain dead part of edit 7413. It checks the break mask when
parsing a _^V and then, get this, decides to RETBAD a COMNX4! This
put COMND in a funny state and all sorts of wild things come back
to the user.
Solution:
Prevent COMND from going to Garkland by checking the function when
a _^V is seen. If the function is .CMUSR or .CMDIR, then have COMND
return to the user via its NOPARS macro.
[End of TCO 7.1047]
TCO-number: 7.1048
Written-by: LOMARTIRE Creation-date: 2-Sep-87 14:17:03
Edit-checked: Yes Document: No TCO-tested: Yes
Maintenance-release: No Hardware-related: No
Program: Monitor
Routines-affected: SCAMPI
Related-TCO: 7.1037
Problem:
Routines SC.ABF and SC.ALD now return "6,,return-address" as the return
address.
Diagnosis:
The XMOVEI now gets section 6 as a result of the code move.
Solution:
Change the instructions to MOVE AC,[MSEC1,,return-address].
[End of TCO 7.1048]
TCO-number: 7.1049
Written-by: GSCOTT Creation-date: 3-Sep-87 16:01:18
Edited-by: GSCOTT Edit-date: 3-Sep-87 16:26:47
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: JSYSA
Problem:
Bad terminal numbers passed via USAGE% JSYS to monitor can cause various
BUGHLTs including ILPPT3. Also USAGE% JSYSes can fail with "default
item not allowed" errors when the line number we are trying to supply is
-1, indicating that the job is detached (this is seen when running LPTSPL
under job 0).
Diagnosis:
No line number range checking is done and the code doesn't handle -1
properly for a line number.
Solution:
Add code in JSYSA to properly range check the line number and allow -1
to be specified.
[End of TCO 7.1049]
TCO-number: 7.1050
Written-by: MCCOLLUM Creation-date: 4-Sep-87 09:42:29
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: FREE
Problem:
PHYICE BUGNIFs when booting a system.
Diagnosis:
When MSCP units are coming online, PHYMSC calls routine ASGRES to obtain
section 0 resident free space with which to build the unit's UDB. If
ASGRES determines that there is not enough pages of resident free space
locked into core, it calls routine GRORES to attempt to lock down
additional pages. Unfortunately, MSCP disk come online at interrupt
level and page faults cannot be tolerated. Therefore, GRORES fails,
a PHYICE BUGINF is generated and all futher disks on the current
controller are ignored (and possibly the controller itself).
Solution:
At system startup, routine RESLCK is called to lock down a fixed number
of resident free space pages. During normal operation of the system, this
many spare pages of free space always remain locked down. While this
is sufficient at most times, more pages are required during system startup.
Add an addtional entry point to RESLCK (RESLCI) that can be called to
lock down twice as many resdident free space pages as normal. Call this
entry point from RUNDD which runs at system startup. Later, CHKR will
call the RESLCK entry point which will unlock any pages above and beyond
the normal value.
[End of TCO 7.1050]
TCO-number: 7.1051
Written-by: GSCOTT Creation-date: 8-Sep-87 16:11:17
Edit-checked: No Document: Yes TCO-tested: No
Maintenance-release: No Hardware-related: Yes
Program: MONITOR
Routines-affected: MEXEC JSYSA globs STG
Problem:
SYSTAT NODE * command may be slow in a cluster environment.
Diagnosis:
A large number of INFO% or GETJI% JSYSes are needed to find all of the
active jobs on the cluster; each system must have jobs 1 to 511 checked
for a SYSTAT NODE * command. In addition, the header printed by the
SYSTAT command shows the number of user and operator jobs logged in.
In order to get this information jobs 1 to 511 must be checked for
each system that is displayed by the SYSTAT NODE * command.
Solution:
Add two new GETAB words to the SYSTAT table. The first word (ACTJOB) will
contain the lowest,,highest job number in use on each system. This will
be used by SYSTAT NODE * to only search a range of jobs on a system rather
than searching jobs 1 to 511. The second GETAB word (WHOJOB) will contain
operator,,user jobs logged into the system. This word can be directly used
by the SYSTAT command rather than searching jobs 1 to 511 looking for which
jobs are logged in. The addition of these two words should make the
*** PERFORMANCE *** of the SYSTAT NODE * command bearable.
[End of TCO 7.1051]
TCO-number: 7.1052
Written-by: WADDINGTON Creation-date: 8-Sep-87 16:41:34
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: latsrv
Problem: LHDSP macro can cause link to loop during monitor builds.
Diagnosis: We changed various IFIWs to XADDR.s. LHDSP (and LSLDT) use .ORG
to fill values into various tables. This smashes fixup chains of various
types, which can cause LINK to loop, or code to be loaded incorrectly, or
other bizarre symptoms.
Solution: Get rid of LHDSP and LSLDT.
[End of TCO 7.1052]
TCO-number: 7.1053
Written-by: GSCOTT Creation-date: 10-Sep-87 07:21:35
Edited-by: GSCOTT Edit-date: 10-Sep-87 16:21:56
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: monsym
Problem:
Orion complains "Invalid type argument in message" from some component.
Another symptom is that you never get "DECnet link messages" from the
monitor.
Diagnosis:
MONSYM defines .QBDMX to be 5 then redefines it to be 4. DECnet link
messages are type 5, so ORION thinks that they are invalid since .QBDMX
ends up being 4.
Solution:
Remove brain damage in MONSYM, so that .QBDMX is 5.
[End of TCO 7.1053]
TCO-number: 7.1054
Written-by: GSCOTT Creation-date: 10-Sep-87 16:25:32
Edited-by: GSCOTT Edit-date: 10-Sep-87 19:52:14
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: phyx2
Problem:
Many PHYX2 BUGCHKs don't supply enough information such as channel,
DX20 number, and drive number. Some of the "Action:" areas could
also be improved.
Diagnosis:
Several of the BUGCHKs weren't documented when the code was written,
and since DX20s are so reliable we don't see too many DX2xxx BUGCHKs.
Solution:
Fix up PHYX2's BUG. macros as needed.
[End of TCO 7.1054]
TCO-number: 7.1055
Written-by: RASPUZZI Creation-date: 15-Sep-87 15:45:55
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: GLOBS PAGUTL DISC JSYSF
Problem:
CFCTNF BUGHLTs along with CFKBNS BUGHLTs.
Diagnosis:
When a user deletes a directory, the CFS resource block describing
this directory is placed on the free list. However, cached OFNs
may still exist. A new OFN may now grab this block and use it for
a file open token. Now we have a scenario where the block is being
used for 2 resources. The cached OFN may release this block and cause
the keep bit to be cleared. When this happens, the CFS garbage collector
returns the resource block back to CFS' free list. This is bad as some
other OFN was using this block as a file open token.
Solution:
Currently, a directory cannot be destroyed if some file is still open
in this directory. However, REMALC always called and, hence, always
removes the resource block and returns it to CFS. While this is not
incorrect, it is inherently dangerous. Have a new routine uncache
all OFNs associated with this resource block before deleting the
directory. This will help prevent CFS from allowing a block to be
used for 2 different resources.
[End of TCO 7.1055]
TCO-number: 7.1056
Written-by: GSCOTT Creation-date: 15-Sep-87 16:12:05
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: TTYSRV DTESRV
Problem:
[DECSYSTEM-20 continued] messages come out on lines that are not
connected to the console front end when it crashes.
Diagnosis:
Monitor (DTESRV) sends to all lines when system reloaded, it should
just send the reloaded message to front end lines.
Solution:
Implement a -2 line number argument to TTMSG, which can only be specified
by the monitor, which will do a sendall to only front end lines. Then
have DTESRV use this new function.
[End of TCO 7.1056]
TCO-number: 7.1057
Written-by: GSCOTT Creation-date: 15-Sep-87 16:19:50
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: IPFREE
Problem:
Occasional failure to allocate blocks of the size requested.
Diagnosis:
Code which insures that block sizes are quantized is a bit too conservative.
Solution:
In IPFREE, at GETBB2, change to allow split blocks.
[End of TCO 7.1057]
TCO-number: 7.1058
Written-by: GSCOTT Creation-date: 17-Sep-87 14:35:36
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: IMPDV
Problem:
TCP/IP-20 does not support class B and C networks as the SPD states.
Diagnosis:
No code to support B and C networks.
Solution:
Add code to support class B and C networks to IMPDV.
[End of TCO 7.1058]
TCO-number: 7.1059
Written-by: GSCOTT Creation-date: 17-Sep-87 16:23:22
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: JSYSF DISC
Problem:
Byte counts of 34359738367, used by LIBOL files, get turned into byte
counts of 1073741823 when these files are copied.
Diagnosis:
OFNLEN truncates the byte count to 30 bits so that it can put the byte
size in bits 0-5. A real file of that length is impossible to create
and 34359738367(36) is meaningful to COBOL. When the monitor truncates
the byte count, COBOL programs get confused.
Solution:
Set OFNLEN to -1 if the byte count is 34359738367(36), and each time that
OFNLEN is referenced and is -1, report the byte count and size to be
34359838367(36).
[End of TCO 7.1059]
TCO-number: 7.1060
Written-by: RASPUZZI Creation-date: 21-Sep-87 15:45:09
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: SCAMPI GLOBS SCAPAR
Problem:
A routine is needed to break data up and store it in a series of
SCA buffers.
Diagnosis:
SCAMPI does not have a routine to do this.
Solution:
Be ambitious and write SC.BRK.
[End of TCO 7.1060]
TCO-number: 7.1062
Written-by: LOMARTIRE Creation-date: 22-Sep-87 16:45:11
Edit-checked: Yes Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: Monitor
Routines-affected: PAGUTL
Related-SPR: 21567
Problem:
FILBAT BUGINFs no longer issued after edit 7247.
Diagnosis:
RELOFN no longer returns the SPTH word in T2.
Solution:
Make RELOFN do so and document the behavior in the routine
header.
[End of TCO 7.1062]
TCO-number: 7.1063
Written-by: MCCOLLUM Creation-date: 23-Sep-87 10:05:32
Edited-by: MCCOLLUM Edit-date: 29-Sep-87 18:32:25
Edit-checked: No Document: Yes TCO-tested: No
Maintenance-release: No Hardware-related: Yes
Program: MONITOR
Routines-affected: PHYSIO JSYSA GTJFN JSYSF DISC DSKALC
MEXEC MSTR PAGUTL STG
Problem:
Jobs hang accessing disk structures that are composed of at least one disk
unit that is offline. Also, DDMPNR and CHKRNR BUGHLTs may be seen.
Diagnosis:
If a disk unit goes offline and jobs subsequently attempt to access the
structure to which the unit belongs, they will become hung. This occurs
because they attempt to do I/O to the disk and the monitor permits it,
despite the fact that the monitor is aware that the disk is offline.
The monitor routine DDOCFS, which is a part of the DDMP process, writes
OFNs out to disk at the request of CFS. However, DDOCFS does not check the
state of the structure before attempting this operation. If the structure
is composed of one or more offline disks, then DDMP will hang. This can
result in the above mentioned BUGHLTs.
With the addition of CI disks in release 6.0 of the monitor, the observed
frequency of these problems has increased dramatically.
Solution:
Implement the Offline Structures feature. This feature involves the
addition of a new bit to the SDBSTS word of the Structure Data Block (SDB).
This bit, MS%OFS, will be turned on and off by a new routine, UDBCHK, in
PHYSIO. When UDBCHK notices that a disk unit has gone offline, it will time
stamp word SDBTMR in the SDB of the structure associated with the unit.
After a variable timeout period has passed, UDBCHK will turn on the MS%OFS
bit in the SDB of the structure. This timeout interval can be set via the
SMON% .SFOFS function. When the MS%OFS bit is lit, no new access to the
disk will be allowed. This will be done by having JSYS's ACCES%, CHFDB%,
CHKAC%, CRDIR%, DELDF%, DELF%, DELNF%, DIRST%, DSKOP%, GTDAL%, GTDIR%,
GTFDB%, GTJFN%, OPENF%, RCDIR%, SIZEF%, and MSTR% check the MS%OFS bit and
return the STRX10 (Structure is offline) error message to the process. This
will prevent the process from attempting the I/O that will hang it up.
UDBCHK will continue to monitor the state of the disk unit that caused
MS%OFS to be turned on. When the disk unit goes online again, MS%OFS will
be cleared in the SDB if all other units in the structure are also online.
The following changes are made as a part of this project:
1) SETSPD will have two new commands added to it. The first is
DISABLE OFFLINE-STRUCTURES. This command will turn this feature off.
When this is done, behaviour will be identical to pre-release 7.0
systems. The second command is the ENABLE OFFLINE-STRUCTURES command.
This command takes as an argument a timeout interval. Valid intervals
are controlled by the SMON% .SFOFS function and are 1 to 900 seconds.
Note that the default state of the Offline Structures feature is ENABLED
with a timeout interval of 60 seconds.
2) The EXEC will have two new commands added to it. They are ^ESET options.
The first is OFFLINE-STRUCTURES followed by a timeout interval or
a confirmation. Confirmation implies use the default value of 60 seconds.
The second is NO OFFLINE-STRUCTURES. This disables Offline Structures.
Also, the INFORMATION SYSTEM-STATUS command will display the state of
this feature along with the timeout interval if it is enabled and the
INFORMATION STRUCTURE command will state that the structure is offline
if MS%OFS is on in the results of an MSTR% .MSGSS function.
3) DDOCFS will call routine STROFL to insure the structure to which
it is writing out an OFN is accessible. If it is not, no attempt will
be made to write out the OFN. This should reduce the number of DDMPNR
and NOCHKR BUGHLTs observed.
[End of TCO 7.1063]
TCO-number: 7.1064
Written-by: GSCOTT Creation-date: 24-Sep-87 14:31:35
Edited-by: GSCOTT Edit-date: 24-Sep-87 14:33:49
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: TAPE
Problem:
There are various problems when reading or writing ANSI labeled tapes. When
the file name field was 17 characters without an embedded dot, IOX5 errors are
returned by the monitor resulting in the message "?File data error on file ..."
from the EXEC COPY command. Generation numbers ending in 00 were turned into
generation numbers ending in 44. Wildcarded copies from tape stopped after the
first file.
Diagnosis:
In the TAPE module, the monitor did not handle 17 character tape filenames
properly; it only allowed the length of the file name and file type field to be
16 characters (it was accounting for the dot that seperates the file name and
file type fields even if there was no file type specified). Also in TAPE, the
monitor didn split the generation number into the two fields on tape
(generation number and generation version) as specified in DEC STD 149 if the
last two digits of the generation number were zero. In GTJFN, the VANISH
routine, added by edit 7393, broke GNJFN% wildcarding on labeled tapes by
trying to find the "old" file before stepping versions.
Solution:
Insert proper code in TAPE to account for the dot in the filespec only if the
file type field is non-zero; insert code to properly handle generation numbers
and generation versions as specified in DEC STD 149; make VANISH only do the
extra lookup if the file is on disk.
[End of TCO 7.1064]
TCO-number: 7.1065
Written-by: RASPUZZI Creation-date: 25-Sep-87 13:27:58
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: PAGUTL
Problem:
PLKRPQ BUGHLTs.
Diagnosis:
There seems to be a race between the swapper and the free space
grower. The free space grower is attempting to lock a page that
currently has a write in progress. As part of the PLKMOD fix, this
is not really allowed. When the swapper completes, it notices that
it had swapped out a locked page (because the free space grower
hadn't finished diddling the page age) and that's when the PLKRPQ
BUGHLT occurs.
Solution:
In MLKPG3, turn PI interrupts off while we diddle the page age
in routine AGESET to prevent this race. When we have changed the
age in AGESET, PIs will be turned back on and the swapper will
then do the right thing.
[End of TCO 7.1065]
TCO-number: 7.1066
Written-by: RASPUZZI Creation-date: 29-Sep-87 14:23:21
Edited-by: RASPUZZI Edit-date: 3-Nov-87 13:48:59
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: PAGUTL
Problem:
INVDIR no worky.
Diagnosis:
Engineer has been drinking too many wine coolers. He used the Tx
registers to save important information and routine OC.UNC was
trashing this information.
Solution:
Make INVDIR use the Qx registers for important things. Also make
INVDIR preserve these registers.
[End of TCO 7.1066]
TCO-number: 7.1067
Written-by: SHREFFLER Creation-date: 29-Sep-87 15:25:25
Edited-by: RASPUZZI Edit-date: 29-Sep-87 15:26:42
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: PAGUTL DISC GLOBS
Problem:
When renaming files between directories, sometimes the count of pages used
will not match the actual usage. It may even go negative but it always
corrects itself in time.
Diagnosis:
When a file is renamed it's OFN is cached. This cached OFN follows the file
to the new directory. The problem is that this cached OFN has a pointer to
the disk allocation block of the old directory. If this file is renamed to
another directory and the cached OFN is still around the disk pages are
subtracted from the alloc block for the original directory instead of the
current directory.
Solution:
In the rename code, after releasing the OFN for the source file, uncache the
OFN.
[End of TCO 7.1067]
TCO-number: 7.1069
Written-by: RASPUZZI Creation-date: 13-Oct-87 14:15:17
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: SCAMPI
Problem:
Routine SC.BR1 does not get data out of the user's address space
from the correct section.
Diagnosis:
SC.BR1 was calling routine BLTUM to get the data. This routine
always gets stuff from section 0 of user address space.
Solution:
Have SC.BR1 use routine BLTUM1. This is the correct routine as
it uses the correct section number for the XBLT.
[End of TCO 7.1069]
TCO-number: 7.1070
Written-by: RASPUZZI Creation-date: 13-Oct-87 14:18:39
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: TTYSRV GLOBS STG
Problem:
Every second the scheduler is calling routine TTYCHK and this
routine is wasting time doing absolutely nothing.
Diagnosis:
Sometime ago this routine was used to check DZ lines. Now, it
is just a dinosaur.
Solution:
Remove TTYCHK from CLK2CL and remove the routine from TTYSRV.
[End of TCO 7.1070]
TCO-number: 7.1072
Written-by: LOMARTIRE Creation-date: 19-Oct-87 08:03:59
Edited-by: LOMARTIRE Edit-date: 24-May-88 16:18:37
Edit-checked: No Document: Yes TCO-tested: Yes
Maintenance-release: No Hardware-related: No
Program: Monitor
Routines-affected: CFSSRV ENQ ENQPAR ENQSRV GLOBS MEXEC
PHYKLP SCAPAR STG SYSFLG MONSYM
Related-TCO: 7.1088 7.1096 7.1103 7.1111 7.1115 7.1138 7.1142
7.1145 7.1172 7.1179 7.1180 7.1286 7.1292
Problem:
Under 6.1, the ENQ/DEQ functionality is limited to system-wide only. No
cluster-wide lock coordination is possible.
Diagnosis:
No code was written to support this feature.
Solution:
Add cluster-wide ENQ/DEQ functionality. This new feature will not be on
for the process by default. The process must do a new function to ENQ% (.ENECL)
in order to have its ENQ%/DEQ%/ENQC% requests assume the cluster-wide
implementation.
[End of TCO 7.1072]
TCO-number: 7.1074
Written-by: RASPUZZI Creation-date: 20-Oct-87 15:53:19
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: CFSSRV
Problem:
CFS has no way to return a CFS node name to other places in the
monitor without knowledge of CFS' host tables.
Diagnosis:
No code to do it.
Solution:
Add routine CFSNOD to CFSSRV to return a node name given a CI node
number.
[End of TCO 7.1074]
TCO-number: 7.1075
Written-by: GSCOTT Creation-date: 20-Oct-87 16:06:30
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: ALL
Problem:
Monitor modules without updated Table Of Contents.
Diagnosis:
Sometimes the TOC wasn't updated.
Solution:
Update modules to have a new TOC using EMACS.
[End of TCO 7.1075]
TCO-number: 7.1076
Written-by: RASPUZZI Creation-date: 20-Oct-87 21:40:38
Edit-checked: No Document: Yes TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: GLOBS CLUDGR CLUFRK CLUPAR MEXEC JSYSA
MONSYM STG FUTILI
Problem:
Currently, TOPS-20 offers no easy way for systems that are clustered
together to share information. The only way this can be done is if the
user writes his own SYSAP using the unsupported SCS% JSYS. Also, this
JSYS can only be used by users with WHEEL or OPERATOR privileges.
Diagnosis:
No code to do it.
Solution:
Add a new monitor SYSYAP, the CLUDGR SYSAP to aid users in gathering
information within a cluster by using the INFO% JSYS. Also, add code
in the EXEC to support cluster sendalls (which is part of the CLUDGR
SYSAP because of new hooks in TTMSG%) and also add code in the EXEC
to support cluster SYSTATs.
[End of TCO 7.1076]
TCO-number: 7.1078
Written-by: MCCOLLUM Creation-date: 21-Oct-87 16:53:50
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: GTJFN
Problem:
A JFNS% performed on a parse only JFN always shows the structure name,
even if you are currently connected to that structure.
Diagnosis:
TCO 7.1063 changed GTJFN% to not save the unique code of a structure in
the JFN block if the user specified parse only. However, JFNS% compares
this field against the connected structure and outputs the structure name
if they differ. Since it is always zero for a parse only, the structure
name is always displayed.
Solution:
Leave the unique code in the JFN block for parse-only JFNs.
[End of TCO 7.1078]
TCO-number: 7.1079
Written-by: MCCOLLUM Creation-date: 21-Oct-87 16:58:55
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: MSTR MONSYM
Problem:
STROFF BUGCHKs.
Diagnosis:
The STROFF BUGCHK was moved from routine STROFL to CKSTOF when the offline
structures feature was implemented. STROFL was changed to call CKSTOF.
CKSTOF is also called by a variety of JSYSes to verify structure numbers.
However, these JSYSes fetch the structure number from user supplied
arguments. If the user gives a bad argument, CKSTOF will issue a STROFF
BUGCHK.
Solution:
Move the STROFF BUGCHK back to routine STROFL. In CKSTOF, if the structure
number is invalid, return the STRX11 error code.
[End of TCO 7.1079]
TCO-number: 7.1080
Written-by: RASPUZZI Creation-date: 23-Oct-87 12:44:14
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: MONSYM STG CLUDGR
Problem:
System PID table is not large enough. There is no room in it to
add NEBULA's PID.
Diagnosis:
SPIDTB not big enough and no mnemonic in MONSYM for it.
Solution:
Add .SPNEB for system PID and .SDNEB for private GALAXY's to
MONSYM. Also make SPIDTB big enough to hold these.
[End of TCO 7.1080]
TCO-number: 7.1081
Written-by: GSCOTT Creation-date: 23-Oct-87 14:47:35
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: APRSRV DSKALC GLOBS MEXEC MSTR PAGUTL
PHYSIO SETSPD MONSYM DOB
Problem:
There is an occasional need to take a dump and continue the system.
Diagnosis:
No code to implement continuable dumps.
Solution:
Add DOB% JSYS and code in new module DOB to take continuable dumps. Change
modules APRSRV, DSKALC, GLOBS, MEXEC, MSTR, PAGUTL, PHYSIO to support DOB.
Change MONSYM and SETSPD utilites to support DOB.
[End of TCO 7.1081]
TCO-number: 7.1082
Written-by: WADDINGTON Creation-date: 23-Oct-87 15:30:53
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: monsym
Problem: No symbols for new LATOP% JSYS Functions.
Diagnosis: Add them to MONSYM.MAC
Solution: Add them to MONSYM.MAC
[End of TCO 7.1082]
TCO-number: 7.1086
Written-by: RASPUZZI Creation-date: 27-Oct-87 15:47:39
Edited-by: RASPUZZI Edit-date: 3-Nov-87 13:44:54
Edit-checked: No Document: Yes TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: MEXEC
Problem:
When a system in the cluster is being shutdown, there is no
way for other systems in the cluster to know this.
Diagnosis:
No code to do it.
Solution:
Now that cluster sendalls exist, make routine THSYS use this feature
to let other machines in the cluster know when a shutdown is going
to occur. This message is only broadcast throughout the cluster at
the 60 minute, 5 minute and 1 minute marks. Also, change the "System
going down" message to say which system is going down.
[End of TCO 7.1086]
TCO-number: 7.1087
Written-by: RASPUZZI Creation-date: 27-Oct-87 15:55:40
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: CLUFRK
Problem:
INFO% does not return job class, share or useage information when
a system has class scheduling turned on.
Diagnosis:
Wrong test used on AVALON to see if class scheduling is turned on.
Solution:
Change SKIPGE to SKIPL to see if AVALON implicates class scheduling.
[End of TCO 7.1087]
TCO-number: 7.1088
Written-by: LOMARTIRE Creation-date: 27-Oct-87 15:56:59
Edit-checked: No Document: No TCO-tested: Yes
Maintenance-release: No Hardware-related: No
Program: Monitor
Routines-affected: ENQ
Related-TCO: 7.1072
Problem:
FSPSCC BUGHLTs, FSPPRE BUGHLTs, FSPBPC BUGHLTs, FSPBND BUGHLTs and jobs hung
in the ENQ related JSYSes.
Diagnosis:
The code which manages the EQLBLT in routine LOKREL was wrong and was
corrupting the hash table and the various ENQ blocks.
Solution:
Fix the code in LOKREL to do the removal from EQLBLT correctly.
[End of TCO 7.1088]
TCO-number: 7.1090
Written-by: RASPUZZI Creation-date: 28-Oct-87 10:14:08
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: CLUPAR CLUDGR CLUFRK
Problem:
Cluster SYSTAT appears to be somewhat slow on a loaded cluster. This
may be an offshoot of the fact that CLUDGR sends over the maximum
number of words per SCA message, when, in fact, it could send over
fewer words.
Diagnosis:
When CLUDGR calls SC.SMG, it always sends C%MUDA words across the
CI. This means that if there is only 1 data word to be sent, CLUDGR
will send far too many words across the wire.
Solution:
Make the send routines in CLUDGR and CLUFRK figure out how many words
to send over and send no more than necessary. This will help increase
performance but cluster SYSTAT may still be slower than a local one.
[End of TCO 7.1090]
TCO-number: 7.1094
Written-by: RASPUZZI Creation-date: 28-Oct-87 14:08:00
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: CLUFRK CLUDGR
Problem:
INFO% return a remote failure of "Structure is not mounted" when,
the structure is mounted on the remote system.
Diagnosis:
Routine CLFMSR that handles the remote execution of the MSTR% is not
getting the structure name from the right place in the data area.
Solution:
Teach CLFMSR to get structure name from correct spot. Also change part
of INFMSR to return structure physical ID to user address space correctly.
[End of TCO 7.1094]
TCO-number: 7.1095
Written-by: MCCOLLUM Creation-date: 28-Oct-87 14:13:06
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: GTJFN
Problem:
WILD% returns "Structure is dismounted" when a JFN against another
which is parse-only.
Diagnosis:
The recent changes to the JFN block when a parse-only JFN is created
have caused CHKJFN to return an incorrect error message to WILD%.
Solution:
Set the up JFN block for parse-only JFNs as it was before.
[End of TCO 7.1095]
TCO-number: 7.1096
Written-by: LOMARTIRE Creation-date: 28-Oct-87 14:13:51
Edit-checked: No Document: No TCO-tested: Yes
Maintenance-release: No Hardware-related: No
Program: Monitor
Routines-affected: ENQSRV GLOBS CFSSRV
Related-TCO: 7.1072
Problem:
SKDPF1 and PITRAP BUGHLTs from routine ENQRSV.
Diagnosis:
This routine was being called at CI interrupt level or scheduler level and was
trying to set a bit in the Lock-Blocks. However, these reside in swappable
free space and so could be swapped out.
Solution:
Change the algorithm for reacting to a cluster state change. Add flag EQCSTF
which will be set to -1 by CFSRSV. The ENQ Resched fork will notice that
EQCSTF is non-zero and will call ENQRSV to set bit EN.SDO in each Lock-Block on
the system. Since the ENQ Resched fork runs at process context, this will
avoid the BUGHLTs.
[End of TCO 7.1096]
TCO-number: 7.1098
Written-by: MCCOLLUM Creation-date: 29-Oct-87 15:03:38
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: STG
Problem:
The Internet host table is too big for TOPS-20.
Diagnosis:
The table size needs to be increased.
Solution:
Increase NHOSTS to 7001, decimal.
[End of TCO 7.1098]
TCO-number: 7.1102
Written-by: RASPUZZI Creation-date: 29-Oct-87 16:03:30
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: CLUDGR
Problem:
System going down messages appear on nodes that have cluster
sendalls disabled.
Diagnosis:
<SYSTEM>INFO runs under job 0 and that's when the system going
down messages are broadcast. CHKPID in CLUDGR assumes that this
process is part of GALAXY and allows the send to happen.
Solution:
Remove the check to see if the PID belongs to <SYSTEM>INFO. If this
process ever needs to do an INFO% JSYS or cluster TTMSG%, then this
will have to be changed. Also note that any GALAXY process running
under job 0 will cause the system going down messages to go through
to nodes that have cluster sendalls disabled. It is not advisable to
run GALAXY jobs under job 0 though.
[End of TCO 7.1102]
TCO-number: 7.1103
Written-by: LOMARTIRE Creation-date: 29-Oct-87 16:11:28
Edit-checked: No Document: No TCO-tested: Yes
Maintenance-release: No Hardware-related: No
Program: Monitor
Routines-affected: ENQSRV
Related-TCO: 7.1072
Problem:
Cluster-wide ENQC% fails for locks described by a user code.
Diagnosis:
Routine HDRFIP is not correctly putting the user code into the VRQA so
it is not getting set to the other systems. This prevents the other systems
from finding the lock and reporting the status.
Solution:
Make HDRFIP get the user code from the user's block and not from P2.
[End of TCO 7.1103]
TCO-number: 7.1104
Written-by: RASPUZZI Creation-date: 29-Oct-87 16:15:08
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: CLUFRK
Problem:
1) The CLUDGR fork cannot execute the MTOPR% JSYS correctly.
Diagnosis:
The MTOPR% JSYS assumes that when MTOPR% is used from any non-zero
section that T1 will contain a OWGB instead of a universal device
designator.
Solution:
Write gross code that makes the CLUDGR fork execute the MTOPR% in
section 0.
[End of TCO 7.1104]
TCO-number: 7.1105
Written-by: RASPUZZI Creation-date: 29-Oct-87 16:18:00
Edited-by: RASPUZZI Edit-date: 3-Nov-87 13:49:34
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: CLUFRK
Problem:
The CLUDGR fork still does not return class scheduling information
correctly.
Diagnosis:
Brain dead TCO 7.1087 uses AVALON instead of CLASSF when checking
for class scheduling.
Solution:
Redo the TCO once again and make the code do the right thing.
[End of TCO 7.1105]
TCO-number: 7.1107
Written-by: GSCOTT Creation-date: 2-Nov-87 11:34:23
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: MEXEC
Problem:
Jobs occasionally get hung in a DOBE% at LOGDOB.
Diagnosis:
Characters in output buffer but terminal line is disconnected, causing
the job to hang forever in the DOBE.
Solution:
Since this DOBE is really meant for the use of something besides .NULIO
in LOGDES, we really don't want to do the DOBE unless T1 matches LOGDES.
[End of TCO 7.1107]
TCO-number: 7.1108
Written-by: GSCOTT Creation-date: 3-Nov-87 14:42:53
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: DOB
Problem:
PSINSKs among other things when using DOB.
Diagnosis:
When the DOB lock is locked we go CSKED. We can't get a page failure
while CSKED, and the symbol table and bug list storage is swappable.
Solution:
When getting the DOB lock go ECSKED, then when releasing it go CSKED first.
This allows us to page fault inside of the DOB JSYS.
[End of TCO 7.1108]
TCO-number: 7.1111
Written-by: LOMARTIRE Creation-date: 4-Nov-87 12:31:46
Edit-checked: No Document: No TCO-tested: Yes
Maintenance-release: No Hardware-related: No
Program: Monitor
Routines-affected: STG
Related-TCO: 7.1072
Problem:
ENQ% and ENQC% JSYSes hang. The ENQ reply fork on the remote systems hangs
in a loop and does not return any replies.
Diagnosis:
Routine LOCLOK uses the hash index as a starting point for the search of the
lock block described in the VRQA. Unfortunately, HSHLEN is defined as
3*NJOBS so it can be different on various systems depending upon the number
of jobs. So, the hash index is invalid and the ENQ reply fork loops while in
the search.
Solution:
Make HSHLEN be 605 (which is 389 decimal which is prime) on each system so
that the hash index is valid on all nodes. This means that HSHLEN must be
defined to be the same value on every 7.0 system in the cluster!
[End of TCO 7.1111]
TCO-number: 7.1112
Written-by: MCCOLLUM Creation-date: 4-Nov-87 13:58:37
Edited-by: MCCOLLUM Edit-date: 4-Nov-87 14:00:05
Edit-checked: No Document: Yes TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: DISC DSKALC FLINI FUTILI GLOBS JSYSA
JSYSF MEXEC MSTR PHYKLP PHYSIO PROLOG
STG SYSERR
Problem:
In a clustered environment, there is now way for multiple system to
share the same username data base.
Diagnosis:
The username data base comes always from the boot structure. Since this
structure must be a massbus device, the shared MSCP disks cannot be used.
Solution:
Add a new startup routine, FNDLGS, to DSKALC to find a separate structure
to use as a Login Structure. This routine will be called after the CI-20
has been started and the MSCP disks are available to the system. This
routine will try to find a complete structure that has the local system's
CPU number saved in the home block words HOMLS1, HOMLS2, HOMLS3, or HOMLS4.
If found, the structure will be manually mounted and the STRTAB index
for it will be saved in global cell LGSIDX. Routines needing to get to
the username data base can use the contents of this cell as the structure
number for the Login Structure. If no separate Login Structure is found,
this cell will contain zero, indicating that the Login Structure is the
same as the Boot Structure.
This TCO involves utility changes to CHECKD, SETSPD, ACTGEN, and GALAXY to
support it.
[End of TCO 7.1112]
TCO-number: 7.1114
Written-by: RASPUZZI Creation-date: 4-Nov-87 16:25:15
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: CLUFRK
Problem:
ILMNRFs and KLPHOGs out when using the INFO% JSYS.
Diagnosis:
Remote system sending back incorrect information for the .INLNS
function. Specifically, the number of words sent back is off by
a large count and the system thinks more SCA buffers came in to
the local system than really did. This causes PHYKLP's port queues
to get so fouled up, you don't want to even consider looking at
the dump.
Solution:
Change routine CLFLNS to correctly calculate the number of words to
send back to the remote system.
[End of TCO 7.1114]
TCO-number: 7.1115
Written-by: LOMARTIRE Creation-date: 5-Nov-87 14:56:07
Edited-by: LOMARTIRE Edit-date: 5-Nov-87 15:35:10
Edit-checked: No Document: No TCO-tested: Yes
Maintenance-release: No Hardware-related: No
Program: Monitor
Routines-affected: ENQ ENQSRV GLOBS CFSSRV
Related-TCO: 7.1072
Problem:
Various hangs in ENQ%/ENQC%. Incorrect vote replies being returned.
Diagnosis:
There are two problems. First, the VRQA is used by both the interrupt level
routine EQMSG and by the process level routine EQATOP. EQMSG assembles an
incoming Request Message Set into the VRQA to be processed by the Vote
Responder routines. EQATOP is the fork which calls the appropriate Vote
Responder routine to send the reply to the vote request in the VRQA. There is
no way now for the Responder routines to guarantee that the VRQA stays stable
through cluster state changes. So, this could cause problems during the
formation of the reply.
The second problem is that the setting of bit VPRTY is now in a process
context routine. This bit is set when the cluster changes state and will allow
a fork in EVWAIT to wake up and try the vote again. But, with the bit setting
in process context, it can potential be blocked from occurring thus hanging the
job in ENQ%/ENQC% waiting for a vote reply to return which will never come.
Solution:
Create a new data structure called the Vote Answer Area (VANA). When EQATOP
is run to send a reply, it will copy the VRQA into the VANA. All the Vote
Responder routines will work with the VANA in deciding the correct response.
Also, the setting of bit VPRTY in the VRPA must be moved into CFSRSV so that
state changes are correctly handled. Finally, add word EQLBCT to keep the
count of the number of Lock-Blocks on the system (they will be linked on the
EQLBLT).
[End of TCO 7.1115]
TCO-number: 7.1116
Written-by: GSCOTT Creation-date: 6-Nov-87 10:49:27
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: PHYM78
Problem:
TM78's KDB active bit is set with no active UDBs. Support of TU79s
makes some BUG messages inappropriate since they talk about TU78s.
Kontroller reset failures are not logged or noticed.
Diagnosis:
Somewhere the TM78's active bit is not getting cleared.
Nobody thought you could have a follow on to the TU78.
Kontroller resets usually work.
Solution:
If the TM78 isn't active and the active bit is on, reset it and BUGCHK.
Change textual messages from "TU78" to "tape drive".
BUGCHK if the kontroller reset didn't work.
[End of TCO 7.1116]
TCO-number: 7.1117
Written-by: GSCOTT Creation-date: 6-Nov-87 10:54:14
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: GLOBS MEXEC PHYSIO STG
Problem:
Monitor too big and too slow.
Diagnosis:
Historic code exists in MEXEC and other places that appears to be leftover
from pre PHYSIO days (e.g. Tenex).
Solution:
Remove code in MEXEC (CHKR) and PHYSIO (DRMINI) that shouldn't be there.
Some time in the future more of the drum stuff could be ripped out but
this will help **PERFORMANCE**, so lets do this part now.
[End of TCO 7.1117]
TCO-number: 7.1119
Written-by: GSCOTT Creation-date: 6-Nov-87 14:38:50
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: MEXEC
Problem:
Monitor too big.
Diagnosis:
Use of literal ASCIZ/CRLF/ when CRLF is in STG.
Solution:
Teach engineer to use CRLF in STG rather than liter at LOGCR.
[End of TCO 7.1119]
TCO-number: 7.1120
Written-by: WADDINGTON Creation-date: 6-Nov-87 16:55:51
Edit-checked: No Document: Yes TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: latsrv
Problem: TOPS-20 can't talk to reverse-lat devices, such as LN03's.
Diagnosis: TOPS-20 uses the LAT 5.0 protocol. Reverse-lat requires the
LAT 5.1 protocol.
Solution: Upgrade TOPS-20 to LAT 5.1 protocol.
[End of TCO 7.1120]
TCO-number: 7.1121
Written-by: RASPUZZI Creation-date: 10-Nov-87 14:52:29
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: CLUDGR
Problem:
None seen but CLDISC and CLUCONs are annoying.
Diagnosis:
Why see these if they are annoying?
Solution:
Put CLDISC and CLUCON under CIBUGX. Also, change CLORBF to a BUGINF
and put it under CIBUGX. Now watch things start to break in massive
quantities.
[End of TCO 7.1121]
TCO-number: 7.1122
Written-by: MCCOLLUM Creation-date: 10-Nov-87 15:55:19
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: MSTR
Problem:
MOUNTR refuses mount requests with "insufficient MOUNTR resources"
after a few forced dismounts of disk structures.
Diagnosis:
When MSTR% is called to dismount a structure, it sends an IPCF message
to MOUNTR informing it of the structure dismount. MOUNTR scans its
account blocks looking for users of that structure. When the blocks
are located, accounting is done and the blocks are removed. However,
MSTR% is not providing the correct structure unique code in this block
due to a problem with the placement of a TRVAR in the JSYS code. This
causes MOUNTR to not find the matching account blocks and therefore
they are never released. If this scenario is repeated often enough,
MOUNTR runs out of space for the account blocks.
Solution:
Fix MSTR% to send the correct structure unique code to MOUNTR when a
structure dismoutn is done.
[End of TCO 7.1122]
TCO-number: 7.1125
Written-by: GSCOTT Creation-date: 10-Nov-87 16:57:38
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: DOB
Problem:
DOB% JSYS function .DBTIM doesn't work and returns DOBX08.
Diagnosis:
Several problems; one is that the correct offset into the argument
block for .DBTIM was not used, the second is that the argument block
was not checked using the right symbol; the third is that the wrong
AC was being used to range check the timeout value.
Solution:
Replace 4 of 8 lines in DBTIM routine in DOB.
[End of TCO 7.1125]
TCO-number: 7.1126
Written-by: MCCOLLUM Creation-date: 10-Nov-87 17:05:17
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: JSYSA
Problem:
The TMON% .SFLGS will not accurately reflect the state of the Login
Structure feature if someone disables Login Structures after the system
has been booted.
Diagnosis:
The TMON% .SFLGS function simply returns the value of the Login Structure
flag. Once the system has been booted, the Login Structure flag has no
meaning. Therefore, if the state of the feature is changed through SMON%,
the TMON% will nop longer be reliable.
Solution:
Instead of returning the value of the LGSFLG, return the negative of the
LGSIDX. LGSIDX will contain 1 if the system is running with a Login
Structure, and 0 otherwise. TMON% show whether or not a Login Structure
is in use.
[End of TCO 7.1126]
TCO-number: 7.1127
Written-by: MCCOLLUM Creation-date: 11-Nov-87 10:24:20
Edit-checked: No Document: Yes TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: SCSJSY
Problem:
OPERATOR jobs cannot execute the SCS% JSYS. New GALAXY components
will not run under user OPERATOR.
Diagnosis:
The SCS% JSYS requires WHEEL, MAINTENANCE , or ARPANET-WIZARD privileges.
When the OPERATOR account is created at structure initialization time,
it is build with only OPERATOR privileges.
Solution:
Change SCS% to allow users with OPERATOR privileges enabled to execute
the JSYS.
[End of TCO 7.1127]
TCO-number: 7.1129
Written-by: MCCOLLUM Creation-date: 11-Nov-87 10:59:20
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: MSTR
Problem:
ILMNRF BUGHLTs when doing an MSTR% .MSRUS function.
Diagnosis:
.MSRUS and .MSRNU share code. When .MSRUS is done, the address of the
UDB for the given unit is not saved in the MSTUDB STKVAR. If the unit
goes offline during the function processing, an ILMNRF BUGHLT can result
if the call to MSTRHB fails. .MSRNU saves the UDB address and therfore
does not have this problem.
Solution:
Move the instruction that saves the UDB address for .MSRNU to common code
so that it is saved in all cases.
[End of TCO 7.1129]
TCO-number: 7.1130
Written-by: MCCOLLUM Creation-date: 11-Nov-87 12:06:12
Edited-by: MCCOLLUM Edit-date: 11-Nov-87 12:15:51
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: SCLINK SCJSYS
Problem:
ILMNRF BUGHLTs out of routine NETSQX in SCJSYS.
Diagnosis:
The PTSTS field of the port block contains invalid data. NETSQX uses
this address to index into a dispatch table. The bad value in PTSTS
causes NETSQX to dispatch into a random monitor location.
PTSTS is set up by copying the SAAST of the SAB. This cell is being
corrupted earlier by a call to routine SCTNSF when the channel is
in a transition state. The port block and the SAB exist, but the link
block has been cleaned up. SCTNSF calls routine SCTNIS which detects
that the link block no longer exists. This routine then call SCTNIX
which expects the pointer to the link block to be negative if there is
no link block. But it is zero and SCTNIX proceeds to store a bogus
value in SAAST (actually the address of the SJB) which subsequently is
copied into PTSTS.
Solution:
If a zero value is passed to SCTNIX as the value of the link block,
just leave SAAST as zero. Fix up the dispatch table in NETSQX to
return a device or data error when the PTSTS value is zero.
[End of TCO 7.1130]
TCO-number: 7.1132
Written-by: RASPUZZI Creation-date: 12-Nov-87 15:06:41
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: CLUFRK
Problem:
Cluster SYSTAT still a little on the slow side when systems in
the cluster get loaded.
Diagnosis:
What do you want from a KL? It's doing its best. However, it would
help to give the CLUDGR fork a priority boost since it really is not
a hog.
Solution:
Light JP%SYS in JOBBIT for CLUDGR fork's PSB and make it run about
the same priority as CHKR.
[End of TCO 7.1132]
TCO-number: 7.1133
Written-by: GSCOTT Creation-date: 12-Nov-87 16:07:24
Edited-by: GSCOTT Edit-date: 12-Nov-87 16:08:25
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: DOB
Problem:
Can you say LOKODR? Can you say CSKBUG?
Diagnosis:
We should have used the LOCK and UNLOCK macros rather than LOKK and UNLOKK.
Solution:
Use LOCK and UNLOCK macros making things a lot smaller and quicker not to
bug free.
[End of TCO 7.1133]
TCO-number: 7.1134
Written-by: GSCOTT Creation-date: 12-Nov-87 16:10:31
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: PHYKLP
Problem:
PHYKLP doesn't know about the new IPALOD created by that extra fine program
IPAGEN.
Diagnosis:
Code not cleaned up from 6.1 days.
Solution:
(1) Check microcode version when the microcode is loaded into resident
storage by PHYKLP. (2) When read counters is done to get the version make
sure it reports the same version as the one we think we loaded. If not then
BUGINF. (3) Clean up comments relating to IPALOD and IPADMP.
[End of TCO 7.1134]
TCO-number: 7.1138
Written-by: LOMARTIRE Creation-date: 16-Nov-87 14:22:14
Edit-checked: Yes Document: No TCO-tested: Yes
Maintenance-release: No Hardware-related: No
Program: Monitor
Routines-affected: STG GLOBS CFSSRV ENQPAR ENQSRV
Related-TCO: 7.1072
Problem:
CFSSRV won't build anymore!
Diagnosis:
MACRO complains of "not enough core" if CFSSRV searchs ENQPAR.
Solution:
Reorganize code so that CFSSRV no longer has to search ENQPAR.
[End of TCO 7.1138]
TCO-number: 7.1141
Written-by: RASPUZZI Creation-date: 18-Nov-87 10:53:46
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: CLUDGR
Problem:
ILMNRFs when ACJ denies the use of the INFO% JSYS.
Diagnosis:
The GTOKM macro can only be used in section 1. INFO% is in section
6. It attempts to execute the GTOKM macro through the use of another
macro, S1XCT. Unfortunately, the address to return to upon ACJ denial
is given as an address in section 6. Since the denial happened out
of section 1, the JRST only looks at the low 18 bits and winds up in
the weeds.
Solution:
In the GTOKM macro, change it so that when ACJ denies INFO% requests
it will jump back to section 6 and ITERR back to the user.
[End of TCO 7.1141]
TCO-number: 7.1142
Written-by: LOMARTIRE Creation-date: 18-Nov-87 13:07:14
Edit-checked: No Document: No TCO-tested: Yes
Maintenance-release: No Hardware-related: No
Program: Monitor
Routines-affected: ENQSRV
Related-TCO: 7.1072
Problem:
There are still jobs which hang in ENQ%/DEQ%/ENQC%.
Diagnosis:
After a cluster state change, all Lock-Blocks on the system are rescheduled.
Also, any vote in progress is marked for a retry. Finally, any answer which is
being made is aborted. It is possible for a retried vote to arrive on a node,
then have that node notice that the cluster changed state and abort the
answering process. Therefore, the sending node will be hung waiting for a
reply which will never arrive.
Solution:
Do not abort answering attempts after a cluster state change. Should the CALL
to SC.SMG fail in routine ANSWER, check the error code. If it was SCSNEC (not
enough credit), retry the answer attempt. If it should fail for any other
reason, or should it succeed, check the unique code of the vote we are replying
to (held in the VANA) against the one that EQMSG put in the left half of
EQFKFL. If they are the same, then we just replied to the correct vote request
so clear the left half of EQFKFL. Otherwise, some other vote request has
arrived and must be processed. Do not clear EQFKFL in this case.
[End of TCO 7.1142]
TCO-number: 7.1144
Written-by: GSCOTT Creation-date: 19-Nov-87 15:57:09
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: DOB
Problem:
Engineer confused about structure offline bit.
Diagnosis:
MS%OFL is returned by MSTR; MS%OFS is in SDBSTS.
Solution:
Use MS%OFS in CKSTR in DOB.
[End of TCO 7.1144]
TCO-number: 7.1145
Written-by: LOMARTIRE Creation-date: 20-Nov-87 07:08:39
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: Monitor
Routines-affected: ENQSRV
Related-TCO: 7.1072
Problem:
The ENQ Answer fork and ENQ Resched fork are not as responsive as they
could be.
Diagnosis:
When they are created, they are not given any special consideration like
the other special system forks.
Solution:
Make both forks a special system fork (JP%SYS) and insure that they fall
no lower that scheduler queue 1.
[End of TCO 7.1145]
TCO-number: 7.1147
Written-by: MCCOLLUM Creation-date: 24-Nov-87 15:33:11
Edit-checked: No Document: Yes TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: EXECSE STG
Problem:
The offline structures default timeout interval is too large.
Diagnosis:
Is it now 60 seconds, a lower value is more reasonable.
Solution:
Modify STG to make the default value 5 seconds. Also change the EXEC to
set it to five seconds when no value is supplied in the
^ESET OFFLINE-STRUCTURES command.
[End of TCO 7.1147]
TCO-number: 7.1148
Written-by: GSCOTT Creation-date: 24-Nov-87 16:57:40
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: PHYSIO
Problem:
When a disk drive is dual ported to two RH20s per system the monitor makes
the primary UDB the second path it finds to the drive, and the secondary
UDB the second path it finds to the drive. This lowers performance when
several drives are dual ported in this manner since (1) the monitor always
does seeks on the primary UDB and (2) the monitor always tries the transfer
on the primary UDB first.
Diagnosis:
Monitor should allocate the primary UDBs across the channel in a fair
fashion. Routine PHYDUA in PHYSIO is stupid.
Solution:
Allocate odd numbered drives to odd numbered RH20 and even numbered drives
to even numbered RH20. This should make the primary/secondary split across
RH20s more efficient. Change PHYDUA to allocate drives in this manner.
[End of TCO 7.1148]
TCO-number: 7.1149
Written-by: WADDINGTON Creation-date: 25-Nov-87 13:44:37
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: FILMSC
Problem: CLOSF causes reverse LAT tty's to go away.
Diagnosis: In TTYCLZ, we call TTHNGU to lower DTR. TTHNGU does a TDCALL to
LTHNGU which terminates the LAT connection, and invokes the cleanup code to
blow away the TTY.
Solution: In TTYCLZ, only call TTHNGU for RSX20F lines.
[End of TCO 7.1149]
TCO-number: 7.1150
Written-by: WADDINGTON Creation-date: 25-Nov-87 15:36:27
Edited-by: LOMARTIRE Edit-date: 13-Dec-87 15:27:46
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Problem: Reverse LAT requests that are queued by the server and then canceled,
are not cleaned up properly. They remain in the server's queue.
Diagnosis: In LATSRV, routine FMCAN is called to format a command message.
The code in FMCAN is just plain wrong.
Solution: Rewrite FMCAN to properly format a cancel command message.
[End of TCO 7.1150]
TCO-number: 7.1151
Written-by: RASPUZZI Creation-date: 1-Dec-87 14:54:44
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: COMND
Problem:
STI% in COMND% is causing partial recognition to beep after
completing a keyword or switch.
Diagnosis:
The STI% appears to be extraneous.
Solution:
Either NOP the STI% or remove it totally. Since we must preserve
0/1 space, I suggest removing it.
[End of TCO 7.1151]
TCO-number: 7.1155
Written-by: WADDINGTON Creation-date: 3-Dec-87 14:48:50
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: latsrv
Problem:
1) LATNSC Type 2 and 12
2) Bogus PRSTS value if pending request is queued by the server
3) Missing Test of Master/Slave Bit in MSGNXT
Diagnosis:
1) MSGSID got transposed into MSGDID when we merged TOPS-10/TOPS-20 LATSRVs.
2) Developer needs a vacation. Did a LOAD AC,LITERAL instead of
MOVX AC,LITERAL.
3) The LAT 5.1 spec says that all messages from Servers will have the
Master/Slave bit lit. This is not true. In particular, the bit is clear on
Response Information Messages and Status Messages. Consequently, Reverse
LAT wouldn't work at all because we ignored all messages which had the bit
clear. We commented out the test until we better understood the problem.
Solution:
1) Change MSGDID to MSGSID at MTTSTP+5 lines.
2) Change LOAD T1,.LAQUE to MOVX T1,.LAQUE near the bottom of RDSTA.
3) Now that we understand the problem, restore the test in NXTMSG, but allow
Response Information messages and Status messages in.
[End of TCO 7.1155]
TCO-number: 7.1156
Written-by: RASPUZZI Creation-date: 3-Dec-87 14:54:12
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: PHYKLP
Problem:
Can't build debug=1 monitor.
Diagnosis:
Misspelled word in PHYKLP under debug conditional.
Solution:
Win spelling bee.
[End of TCO 7.1156]
TCO-number: 7.1158
Written-by: GSCOTT Creation-date: 10-Dec-87 14:37:29
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: APRSRV
Problem:
BUGNAM doesn't get set right if you have a problem calling SEBCPY.
Diagnosis:
If there is some problem queueing the ERROR.SYS block then we don't store
the BUGNAM.
Solution:
Store the BUGNAM early on when doing a BUGHLT. This is the only case when
the BUGNAM doesn't appear to be set right - it could be a problem with CHKs
and INFs, but this hasn't been seen yet. Also APRSRV lacks a TOC and
needs to be repaginated (I hate those messy page breaks).
[End of TCO 7.1158]
TCO-number: 7.1159
Written-by: RASPUZZI Creation-date: 10-Dec-87 14:44:12
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: COMND
Problem:
When user types an invisible keyword followed by an escape, the
monitor complains that his command input buffer is too small.
Diagnosis:
Partial recognition is deficient in this case. It is taking nulls
as the suffix (there is no suffix on invisible or norec keyowrds)
and depositing them infinitely in the user's command buffer until
space is exhausted.
Solution:
Teach COMND% to ignore norec keywords.
[End of TCO 7.1159]
TCO-number: 7.1162
Written-by: RASPUZZI Creation-date: 15-Dec-87 13:06:00
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: STG
Problem:
MONPDL BUGHLTs when using a debug=1 monitor.
Diagnosis:
BUGPDL is not big enough when debugging code is on.
Solution:
Increase BUGPDL in debug monitors but leave it the same in
regular monitors.
[End of TCO 7.1162]
TCO-number: 7.1164
Written-by: WADDINGTON Creation-date: 16-Dec-87 09:21:51
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: LATSRV
Problem: LAT maximum active circuits can be set higher than maximum allocated
circuits, resulting in a log of LATNSCs. Also, user's cannot customize the
default max active circuits. MINACB is a bad name for default Maximum
Allocated circuits blocks.
Diagnosis: The LCP>SET MAX CIRCUITS 50 command changes the maximum active
circuits (HNMAC) to the desired value, but does not change the maximum
allocated circuit blocks (HNMXC). The LATOP% JSYS calls to either set or
clear a parameter is done by executing an instruction out of the TBEXEC table,
but unfortunately this table just moves the new maximum value into HNMAC.
Solution: Change PARTB. for the set max circuits to jump to SETMAC, instead of
just storing the new value into HNMAC. Add routine SETMAC which got dropped
when we merged TOPS-10 and TOPS-20 sources. (We dropped it because it never
got called in 6.1 or 7.03)
Also, change symbol MINACB to MAXACB, add symbol MAXACC (for Max Active
Circuits), and move both symbols to STG so customers can customize them if
desired.
[End of TCO 7.1164]
TCO-number: 7.1165
Written-by: RASPUZZI Creation-date: 17-Dec-87 16:26:29
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: JSYSA
Problem:
Cannot set SNOOP% breakpoints in non 0/1 sections. This is a
handicap for attempting to time monitor routines for performance
purposes.
Diagnosis:
SNOOP% uses a full 30 bit address but when it stores the breakpoint
in the breakpoint block, it steps on the section number with a
"JRST" OP code.
Solution:
Add another word to the breakpoint block SNPADR. This will contain
the 30 bit address of the breakpoint (to be used during insertion
and removal of the breakpoint). It is still OK to use 18 bit addressing
for JRSTs in the breakpoint block because it will be section relative
and the monitor will be in the appropriate section already.
[End of TCO 7.1165]
TCO-number: 7.1167
Written-by: SHREFFLER Creation-date: 23-Dec-87 09:27:17
Edited-by: RASPUZZI Edit-date: 23-Dec-87 09:28:54
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: PAGEM GLOBS PAGUTL
Problem:
SWOFCT BUGCHKs
Diagnosis:
On long files when the SMAP% Monitor call unmaps a file section it should
check to see if the share count on the OFN is going to zero and if so it
should cache it. It doesn't do this. As a result uncached unshared index
blocks can be left lying around and the system may decide to swap them out
if memory gets tight. This causes SWOFCTs
Solution:
Have SMAP% call the OFN caching routine when it unmaps a file section and
sees the share count going to zero.
[End of TCO 7.1167]
TCO-number: 7.1168
Written-by: RASPUZZI Creation-date: 28-Dec-87 11:16:03
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: GTJFN MACSYM
Problem:
ASTJFN BUGHLTs.
Diagnosis:
GTJFN is working with a non-existant file specification that contains
a wild card character. This is a no-no and can occur when a user types
"COPY TTY: FOO.BAR;*".
Solution:
Make GTJFN smart enough to realize that ";*" is an undefined file
attribute. We don't do VMS file specifications here.
[End of TCO 7.1168]
TCO-number: 7.1169
Written-by: RASPUZZI Creation-date: 31-Dec-87 10:27:17
Edited-by: RASPUZZI Edit-date: 31-Dec-87 10:32:10
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: TAPE
Problem:
GDSTS% and SDSTS% JSYS leave JFN locked when returning with an error out
of routines MTGTSX and MTSTSX.
Diagnosis:
Edit 7446 added an ITERR to prevent crashes. However, when this
path is taken, the JFN is never unlocked in either routine.
Solution:
Release file lock by calling UNLCKF before taking error return
out of MTGTSX and MTSTSX.
[End of TCO 7.1169]
TCO-number: 7.1172
Written-by: LOMARTIRE Creation-date: 6-Jan-88 13:54:12
Edit-checked: Yes Document: No TCO-tested: Yes
Maintenance-release: No Hardware-related: Yes
Program: Monitor
Routines-affected: ENQ ENQSRV
Related-TCO: 7.1072
Problem:
ILLUUO and ILFPTE BUGHLTs when an ENQ% is done on systems which do
not have a CI installed (or NOKLIP is set). Also, some strange results can
occur when ENQs and DEQs are done.
Diagnosis:
Location EQLBLT is initialized in routine EQSINI when SCA starts up
the SYSAPs. But, if there is no CI, then SCA is never started and EQLBLT is
never initialized. This causes problems in later ENQ%/DEQ% attempts.
Solution:
Move the initialization of EQLBLT and EQLBCT into routine ENQINI.
This routine is called at system startup.
[End of TCO 7.1172]
TCO-number: 7.1174
Written-by: WADDINGTON Creation-date: 13-Jan-88 09:01:50
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: latsrv
Problem: SKDPF1 while trying to print to LAT printers.
Diagnosis: Using T1 instead of SB. Not setting up T1 prior to call to PRWAKE.
CALL PRWAKE should be CALLRET PRWAKE.
Solution: Don't let the developer try to write code, cope with Thanksgiving,
Christmas, an impending baby, and start field test, all at the same time.
[End of TCO 7.1174]
TCO-number: 7.1175
Written-by: MCCOLLUM Creation-date: 13-Jan-88 09:51:02
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: fsprem
Problem:
A routine is needed in FREE.MAC to determine the amount of free space
left in a given extended space swappable pool.
Diagnosis:
As above.
Solution:
Add routine FSPREM. This routine takes as an argument the swappable free
space pool number (an index into FSPTAB) and returns the amount of free
space remaining in the pool in words.
[End of TCO 7.1175]
TCO-number: 7.1177
Written-by: MCCOLLUM Creation-date: 14-Jan-88 14:44:56
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: DTIND6
Problem:
DN60 front ends cannot be initialized.
Diagnosis:
Routine DTIND6 in DTESRV is called when the BOOT% .BTIPR function is used.
This function attempts to intialize DN60 protocol on a given DTE. This
routine attempts to acquire 1000 (octal) words from the section 0/1
resident free space general pool. However, the intialization is done
after system startup and the request fails. This is because the new DOB
feature of the monitor acquires 1000 words of the same free space at
system startup. DOB does not release these words and there are not
enough left in the pool to satisfy DTESRV's subsequent request.
Solution:
Increase the size of the section 0/1 resident free space general pool
by 1000 (octal) words. This is defined by the .RESGQ parameter in STG.
[End of TCO 7.1177]
TCO-number: 7.1179
Written-by: LOMARTIRE Creation-date: 14-Jan-88 15:33:49
Edited-by: LOMARTIRE Edit-date: 15-Jan-88 15:49:59
Edit-checked: Yes Document: No TCO-tested: Yes
Maintenance-release: No Hardware-related: No
Program: Monitor
Routines-affected: ENQ ENQSRV
Related-TCO: 7.1072
Problem:
When a vote is received for a cluster-wide non-file lock, the system will
respond even if it only has the lock in non-cluster-wide mode. This can cause
the cluster-wide request to fail incorrectly.
Also, vote requests are being sent out from interface routines EQPOOL
and EQLKSD for non-cluster-wide locks. This is simply extra work.
Diagnosis:
The EN.CLL bit must be used to represent a cluster-wide lock and not just a
cluster-wide file lock.
Solution:
Make EN.CLL represent a cluster-wide lock in the Lock-Block if EN.CLL is lit
in any of the associated Q-Blocks. Have LOCLOK return failure if the
Lock-Block is found on the system but EN.CLL is not lit. Have EQPOOL
and EQLKSD check EN.CLL before attempting to send the vote request. Routine
EQRSTS should always vote if the process has cluster-wide capabilities even
if the Lock-Block does not have EN.CLL set.
[End of TCO 7.1179]
TCO-number: 7.1180
Written-by: LOMARTIRE Creation-date: 14-Jan-88 16:52:53
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: Monitor
Routines-affected: ENQ GLOBS STG
Related-TCO: 7.1072
Problem:
FSPOUT BUGINFs during ENQ usage from the ENQ free pool.
Diagnosis:
The Lock-Block caching feature will make a large number of Lock-Blocks remain
on the system instead of being returned to the pool when the last Q-Block is
dequeued. The variable ENQSPC keeps track of the amount of ENQ free pool
space used and this amount is used to determine if garbage collection of
long-term lock should occur. It appears that ENQSPC can drift from the real
allocation and so garbage collection is incorrectly postponed. So, when an
attempt to get free space is made, the pool is exhausted, an FSPOUT results,
and a forced garbage collection is done. This will release the cached
(long-term) locks and provide enough space to continue. In the dumps seen so
far, ENQSPC is much greater than the real allocation and there are many
Lock-Blocks on EQLBLT.
Solution:
Remove ENQSPC and use the new FSPREM routine to determine the amount of space
left in the ENQ free pool.
[End of TCO 7.1180]
TCO-number: 7.1181
Written-by: RASPUZZI Creation-date: 15-Jan-88 10:54:37
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: JSYSF
Problem:
The RFTAD% JSYS initializes one too many words in the user's
argument block.
Diagnosis:
TCO 7.1044 has a fencepost error in it.
Solution:
Adjust the count before calling BLTUU out of the RFTAD% JSYS
so it initializes the correct number of words.
[End of TCO 7.1181]
TCO-number: 7.1183
Written-by: GSCOTT Creation-date: 15-Jan-88 14:41:22
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: DATIME
Problem:
Monitor keeps running out of section 0/1 space.
Diagnosis:
Monitor needs code moved to section 6.
Solution:
Move DATIME into section 6. This gets a little over 2100 (octal) words out
of SWAPCD and into XSWAPCD.
[End of TCO 7.1183]
TCO-number: 7.1185
Written-by: MCCOLLUM Creation-date: 15-Jan-88 16:17:02
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: JNTMAN
Problem:
The available section 0/1 address space in the monitor is getting low.
Diagnosis:
Solution:
Move the NODE% JSYS all all other code in module JNTMAN to XCDSEC. This
gets back about 1400 (octal) words of section 0/1 space, mostly from NRCOD.
[End of TCO 7.1185]
TCO-number: 7.1186
Written-by: WADDINGTON Creation-date: 15-Jan-88 20:50:29
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: LATSRV
Problem: LATOP% Function .LASHC returns local job number. This is not overly
useful.
Diagnosis: LATOP% Function .LARHC calls routine LARHC. LARHC calls CKRHC.
CKRHC stashes a copy of JOBNO in the PR block. JOBNO is the local job number.
Solution: In CKRHC, use GBLJNO instead of JOBNO. In JBTHC, use GBLJNO instead
of JOBNO when determining whether to terminate all requests for a particular
job.
[End of TCO 7.1186]
TCO-number: 7.1190
Written-by: RASPUZZI Creation-date: 20-Jan-88 11:28:09
Edited-by: RASPUZZI Edit-date: 20-Jan-88 11:29:12
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: CFSSRV CFSPAR CFSUSR
Problem:
It is no longer possible to obtain a CREF of CFSSRV.
Diagnosis:
CFSSRV is just so big right now that MACRO can't handle all the
symbols that are generated.
Solution:
Like A T _& T, split up CFSSRV. Put the user related stuff in CFSUSR
and put the JSYS code into SWAPCD. Leave the SCA and vote stuff in
module CFSSRV. Also, remove the MSKSTRs and DEFSTRs from CFSSRV and
put them in CFSPAR.
[End of TCO 7.1190]
TCO-number: 7.1191
Written-by: MCCOLLUM Creation-date: 21-Jan-88 10:26:18
Edited-by: MCCOLLUM Edit-date: 21-Jan-88 10:31:09
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: ARPSND
Problem:
None observed, but code looks wrong.
Diagnosis:
Replies to Address Resolution Protocol (ARP) requests are sent to the
Ethernet broadcast address rather then the sender's hardware address.
While this behavior does not specifically violate RFC 826 pertaining
to ARP, it does cause extra packets to be processed by all TCP/IP
nodes on the Ethernet for no gain.
Solution:
When replying to an ARP request, direct the reply only to the sender's
hardware address. ARP requests will continue to be sent to the broadcast
address as specified in the RFC.
[End of TCO 7.1191]
TCO-number: 7.1192
Written-by: EVANS Creation-date: 21-Jan-88 11:19:20
Edited-by: EVANS Edit-date: 21-Jan-88 14:33:33
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: JSYSF
Related-SPR: 21695
Related-QAR: 5
Problem: ILMNRF's during creation of long directory names.
Diagnosis: Edit 7416 to JSYSF didn't allow enough words for 39-character
device, 39-character directory. Long names overflow the space allotted
in JSB free space.
Solution: Increase the size of UDGRNM to 2*MAXLW+1 (17 words). In addition,
change the SOUT% which copies the name to terminate on 2*MAXLC+5
characters instead of null.
[End of TCO 7.1192]
TCO-number: 7.1193
Written-by: MCCOLLUM Creation-date: 21-Jan-88 14:46:18
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: PHYMSC
Problem:
Too many MSCCDF BUGINFs.
Diagnosis:
These aren't worth seeing. When they occur there are other cluster problems
that make themselves obvious.
Solution:
Save some paper. Put MSCCDF under CIBUGX.
[End of TCO 7.1193]
TCO-number: 7.1194
Written-by: GSCOTT Creation-date: 22-Jan-88 10:52:56
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: POSTLD
Problem:
POSTLD doesn't correct PSECTs properly when ENCOD overlays NPVAR.
Diagnosis:
POSTLD just checks to see that ENCODZ is less than <XCDSEC,,<NPVAR-1>>
but doesn't do anything about NPVAR's start when this happens.
Solution:
Write code to fix NPVAR's origion when ENCOD overlaps it.
[End of TCO 7.1194]
TCO-number: 7.1196
Written-by: WADDINGTON Creation-date: 25-Jan-88 09:41:50
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: latsrv
Problem: Global job number gets stored in device tables for reverse LAT ttys.
Need local job number...
Diagnosis: TCO 7.1186 is wrong.
Solution: Rip out TCO 7.1186. Instead of storing the global job number in
the PR block, call LCL2GL to translate the local job number to a global job
number when needed.
[End of TCO 7.1196]
TCO-number: 7.1197
Written-by: GSCOTT Creation-date: 25-Jan-88 23:46:32
Edited-by: GSCOTT Edit-date: 26-Jan-88 00:49:31
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: PAGEM
Problem:
MDDT JSYS seems to hang when swappable monitor not locked down.
Diagnosis:
Section XCDSEC is mapped roughly where NRCOD is mapped in section 0/1, and
the rest of the section is mapped to section 0/1 through MMAP. When code
was moved to XCDSEC recently by someone with good intentions, he caused
XRCOD+XNCOD+ENCOD to exceed the size of NRCOD, something apparently which
was unthinkable in 6.1. When MDDT is run it just happens to go to section
ENCOD, which can be at a higher address than the end of NRCOD in the
AN-MONDCN monitor. When routing FPTAXC is called to resolve the page fault
(assuming that SWPMLK wasn't called sometime after GOTSWM to lock down
this page of ENCOD), the address of the first page in ENCOD appears to be after
the end of NRCOD, so FPTAXC thinks that this is time for a section 0/1
page fault and dispatches to take care of that. We then start looping in the
code trying to resolve this page fault in section 6 by trying to substitute a
page in section 0/1 space that isn't mapped.
Solution:
Change FPTAXC in PAGEM to address range of XRCODP (first page in section 6
address space) and ENCODL (last page in section 6 address space) rather
than using NRCODP and NRCODL. A recent edit to POSTLD makes it move NPVAR
up so that it won't collide with ENCOD or NRCOD.
[End of TCO 7.1197]
TCO-number: 7.1200
Written-by: GSCOTT Creation-date: 26-Jan-88 16:28:20
Edited-by: GSCOTT Edit-date: 26-Jan-88 16:35:01
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: MEXEC JSYSA JSYSM
Problem:
MEXEC needs to go on a code diet. JSYSA is also too big. Not only that, but
we are low on section 0/1 space.
Diagnosis:
MEXEC and JSYSA are two of the oldest modules and therefore are two of the
biggest modules. We always need more section 0/1 space, don't we?
Solution:
Split MEXEC into MEXEC and JSYSM, moving all JSYS code to JSYSM. This makes
MEXEC about half its previous size.
Also move CRJOB, LOGIN, and USAGE code from JSYSA to JSYSM. Move GJINF, TIME,
RUNTM, GTRPI, GTDAL, SYSGT, GETAB, SETSN, SETNM, GETNM, GETJI, SWTCH, LITES,
USRIO, PEEK, and XPEEK to XCDSEC. This gets the free section 0/1 space up to
10 pages in the AN-MONDCN monitor.
[End of TCO 7.1200]
TCO-number: 7.1201
Written-by: MCCOLLUM Creation-date: 27-Jan-88 09:53:32
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: LATSRV
Problem:
Throughput on LAT terminal lines is not great.
Diagnosis:
The LAT slot size on TOPS-20 is only 40 bytes. Throughput would be improved
with any increase in this size.
Solution:
Due to address space limitations, the number of terminal buffers available
to LAT lines cannot be increased. However, the current size of 123 (decimal)
bytes would allow for slightly larger slot sizes. Increase the slot size
to 60 (decimal) bytes.
[End of TCO 7.1201]
TCO-number: 7.1202
Written-by: RASPUZZI Creation-date: 28-Jan-88 14:45:55
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: Yes
Program: MONITOR
Routines-affected: PHYKLP
Problem:
PHYKLP does not gather statistics for how long it spends servicing
interrupts.
Diagnosis:
No code to do it.
Solution:
Make PHYKLP do something similar to PHYKNI when it starts the processing
of an interrupt. Mainly, save the time the interrupt service starts, and
then calculate the time spent in interrupt service. This will be added
to a location (TOTKLP) that will hold the total amount of time PHYKLP
spent servicing interrupts.
[End of TCO 7.1202]
TCO-number: 7.1203
Written-by: MCCOLLUM Creation-date: 28-Jan-88 15:12:06
Edited-by: MCCOLLUM Edit-date: 28-Jan-88 15:13:08
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: STG
Problem:
DCN=0 monitors crash at startup.
Diagnosis:
When DCN=0, the NODE% JSYS is defined in module STG in section 0/1. JSTAB
assumes that NODE% is in section XCDSEC.
Solution:
Move the NODE% JSYS that is under DCN=0 conditional in STG to section XCDSEC.
[End of TCO 7.1203]
TCO-number: 7.1204
Written-by: RASPUZZI Creation-date: 2-Feb-88 14:51:52
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: DIRECT
Problem:
ILMNRF BUGHLTs when doing partial recognition on directories.
Diagnosis:
Partial recognition on directories is not implemented yet routine
STRFND thought it would go ahead and do it without asking us.
Solution:
Don't make STRFND do partial recognition work if we are recognizing
a directory specification.
[End of TCO 7.1204]
TCO-number: 7.1205
Written-by: RASPUZZI Creation-date: 2-Feb-88 14:54:27
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: CLUDGR
Problem:
Weird, wonderful and wild BUGHLTs when using the INFO% JSYS.
Diagnosis:
Some of the JSYS have been moved into section 6. These include
XPEEK%, GETAB%, GETJI% and SYSGT%. INFO% does an IMCALL to each
one of these if the local node has been specified in the INFO%
argument block.
Solution:
Have INFO% do IMCALLs to these JSYS but make sure that the IMCALL
uses the correct section.
[End of TCO 7.1205]
TCO-number: 7.1206
Written-by: RASPUZZI Creation-date: 2-Feb-88 15:00:19
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: MSTR IPCF
Problem:
MOUNTR is complaining about not writing accounting records
for mount requests.
Diagnosis:
Edit 7508 introduced a change to make the monitor pass the
global job number to MOUNTR. Unfortunately, this edit added
a TRVAR variable in 2 routines in MSTR. This is bad as routine
DISMES in IPCF depends on the order of the TRVAR.
Solution:
Shoot person who wrote DISMES. Since that is not feasible, nor
is it legal, then we will just fix the TRVAR in MSTR and make
a little note for our future siblings to be careful when treading
in this code.
[End of TCO 7.1206]
TCO-number: 7.1207
Written-by: WADDINGTON Creation-date: 2-Feb-88 15:11:50
Edit-checked: No Document: Yes TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: LATSRV STG MONSYM
Problem: New LATOP% function .LARHC doesn't talk to the ACJ.
Diagnosis: It's in the spec but we didn't have time to implement it prior
to FT1.
Solution: Add it now, before we freeze for FT2...
[End of TCO 7.1207]
TCO-number: 7.1208
Written-by: RASPUZZI Creation-date: 2-Feb-88 16:02:31
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: STG
Problem:
Remote possiblity of a SKDPF1 BUGHLT in working set management code.
Diagnosis:
When the monitor adjusts a user's working set, it may decide to
swap out a set of pages. This decision is made at scheduler level.
When the swap out occurs, the monitor then calls CLROFN which calls
DASOFN. DASOFN then references the ALOC tables. These tables are
not resident are could possibly be swapped out at the time. Page
faults in the scheduler are illegal.
Solution:
Move the ALOC tables from non-resident storage to resident storage.
[End of TCO 7.1208]
TCO-number: 7.1209
Written-by: MCCOLLUM Creation-date: 3-Feb-88 13:25:30
Edited-by: MCCOLLUM Edit-date: 3-Feb-88 13:41:21
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: LOKWAI
Problem:
System load average and scheduler overhead increase when an
incoming CTERM job in XOFFed.
Diagnosis:
The scheduler test LOKWAI is used to cause a job to block when a running
CTERM circuit is XOFFed. A bad compare instruction in this routine causes
it to return +2 (success) and the fork is woken up. Because the fork is
not ready to do output, it blocks again immediately. This cycle is repeated
until the CTERM circuit is XONed.
Solution:
Fix the compare instruction in the LOKWAI scheduler test. If the
circuit's state goes from "run" to some other state, then the fork
should wake up so it can be killed off.
[End of TCO 7.1209]
TCO-number: 7.1210
Written-by: GSCOTT Creation-date: 3-Feb-88 15:52:34
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: MANY
Problem:
Many BUGs not set "normally not dumpable".
Diagnosis:
We only recently decided which ones were "normally not dumpable" for DOB.
Solution:
Change the ones we decided, and only the ones we decided, to have the
"normally not dumpable" bit set in their BUG. macros. Many monitor
modules will be changed for this.
[End of TCO 7.1210]
TCO-number: 7.1212
Written-by: RASPUZZI Creation-date: 4-Feb-88 13:23:24
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: CLUDGR
Problem:
ILMNRF BUGHLTs out of INFO% JSYS.
Diagnosis:
When a remote system is out of free space and cannot create a request
block for the request that it just got, it does not correctly report
the INFX08 error to the local system. The local system now thinks that
it just got a good result back from the remote system and things go
downhill from there. The local system uses wrong free space size and
address and tries to do something with this bogus address.
Solution:
When an INFO% request has arrived and the system cannot create a
request block (due to insufficient resources), make sure that the
INFX08 error is returned but also insure that CL%ERR is lit in the
flag word of the response. This will make the requesting system
handle the error case correctly.
[End of TCO 7.1212]
TCO-number: 7.1213
Written-by: GSCOTT Creation-date: 4-Feb-88 13:54:18
Edited-by: GSCOTT Edit-date: 5-Feb-88 11:07:56
Edit-checked: No Document: Yes TCO-tested: No
Maintenance-release: No Hardware-related: Yes
Program: MONITOR
Routines-affected: PHYH2
Problem:
Class A massbus errors cause ILLGO BUGHLTs.
Diagnosis:
If the RH20 has a stacked transfer pending and if the massbus device
experiences a Class A Massbus error, the monitor does not clear the stacked
transfer before starting a retry of the primary transfer. This leads to a hung
transfer.
When the monitor clears the RH20 to start the transfer over it does not clear
the secondary transfer registers, and at the end of the hung transfer code an
extra interrupt is generated. This interrupt causes the monitor to believe
that one of the transfers it just restarted has completed. The monitor then
throws away the IORB for that first transfer and stacks yet another transfer
(if one is available). The origional first transfer then completes properly.
Then the monitor checks out the RH20 logout area with the IORB it thinks
belongs to the transfer (the second transfer started after the hung code tried
to reset the RH20), and since things smell funny the monitor gives an ILLGO
BUGHLT.
A detailed investigation was performed by Nat Gillespie, System Support Group,
UKCSC Basingstoke, UK.
Solution:
Change at RH2HNG from MOVEI T1,0 to MOVEI T1,1, in order to do a complete reset
of the RH20 (including the secondary registers), which prevents the extra
interrupt from being generated. This then prevents the ILLGO BUGHLT.
[End of TCO 7.1213]
TCO-number: 7.1214
Written-by: GSCOTT Creation-date: 4-Feb-88 14:03:30
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: PHYM78
Problem:
TM8FKRs aren't too informative.
Diagnosis:
Only the channel and unit numbers are printed.
Solution:
Print the useful parts of the additional information in the BUGCHK.
[End of TCO 7.1214]
TCO-number: 7.1215
Written-by: GSCOTT Creation-date: 4-Feb-88 16:50:24
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: DOB
Problem:
DOB can trash multipack structures. Also it may not be able to read them
properly (if super index block and/or index block for page 0 are not on
the first unit of a multipack structure).
Diagnosis:
Code looks wrong and generally stinks.
Solution:
Fix several places where multipack structures aren't treated right: in
GETPGS don't forget to do TLZ DSKMSK; remove routine NEXTXB and put 3
instructions inline in GETPGS instead; fix strange AC usage in MAPXB;
move disk address computation from CCWSET to UDBSET. Also comment out
code that searches for a specific generation on disk, and add small
routine that prints out the usual "? DOB Error:" string. Output message
if we are aborting the dump.
[End of TCO 7.1215]
TCO-number: 7.1216
Written-by: GSCOTT Creation-date: 5-Feb-88 11:03:02
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: POSTLD
Problem:
Many BUGs are set not normally dumpable. It is a royal pain to figure
out which ones are.
Diagnosis:
It would be logical to expect this type of information in BUGSTRINGS.TXT
Solution:
Add code to POSTLD in DOBUGS to output a "*" column 1 of BUGSTRINGS.TXT
if the BUG is normally not dumpable and a space in column 1 if the BUG
is normally dumpable.
[End of TCO 7.1216]
TCO-number: 7.1218
Written-by: GSCOTT Creation-date: 9-Feb-88 10:48:37
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: ALL
Problem:
Copyrights are out of date.
Diagnosis:
Its up to me to fix them.
Solution:
Use EMACS/DIRED and update to corporate copyrights.
[End of TCO 7.1218]
TCO-number: 7.1219
Written-by: GSCOTT Creation-date: 9-Feb-88 14:19:51
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: DTESRV
Problem:
Front end dumps end up in PS:<SYSTEM> rather than BS:<SYSTEM>.
Diagnosis:
DTESRV copies them to <SYSTEM>nDUMP11.BIN. It should use BS:.
Solution:
Make DTESRV know about even more BS than it knows now.
[End of TCO 7.1219]
TCO-number: 7.1220
Written-by: RASPUZZI Creation-date: 11-Feb-88 15:24:46
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: CFSUSR GLOBS STG
Problem:
CFSUSR allocates 128 slots of global jobs at system startup. It
is not necessary to have so many unused slots in use on the local
system. It also prevents one from adding a fifth or even sixth
KL to the cluster.
Diagnosis:
Routine CFGTJB does this allocation at system startup. It is
not necessary to grab so many slots.
Solution:
At system initialization, only have CFGTJB allocate 64 jobs for
the local system. When all 64 are in use, then routine JBGET1
will call CFSGJB (an alternate entry point to CFGTJB) and this
will attempt to get 32 more global job slots for the system.
64 seemed like a good number because this will allow one to hook
up as many as 8 KL CPUs in the cluster and then all global job
slots will be in use.
[End of TCO 7.1220]
TCO-number: 7.1221
Written-by: RASPUZZI Creation-date: 11-Feb-88 15:32:54
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: PHYKNI KNILDR
Problem:
Every time a new edit of the NI microcode appears, you have
to hack KNILDR after loading the CRAM addresses to make KNILDR
fool the monitor into believing this is the right NI ucode.
Diagnosis:
This may come as a surprise, but no one ever changed KNILDR or the
monitor to use the right bit mask for the major version of the NI
ucode. Also, in PHYKNI, the monitor will not load and start the
KLNI if the wrong "version" of the NI ucode is loaded.
Solution:
Make PHYKNI check the edit level of the NI ucode. If it is not
up to 171 for 7.0, then BUGCHK but still start the KLNI. For
6.1, we will BUGCHK if the ucode is not at least 167.
[End of TCO 7.1221]
TCO-number: 7.1222
Written-by: RASPUZZI Creation-date: 11-Feb-88 15:37:50
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: STG
Problem:
Can't build a DEBUG=1 monitor.
Diagnosis:
Not enough section 0/1 space.
Solution:
Decrease JSB free space by a few pages and drop a few SNOOP
pages when building a DEBUG=1 monitor.
[End of TCO 7.1222]
TCO-number: 7.1223
Written-by: RASPUZZI Creation-date: 11-Feb-88 20:12:30
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: RSXSRV
Problem:
Can't compile RSXSRV.
Diagnosis:
When the copyright was updated, the exclamation point that was used
to delimit a big comment field was deleted. This caused the next
exclamation point seen to be used to delimit the comment field. Naturally,
this now shifted the field and ate some code so there were various
undefines and oddities.
Solution:
Put back the ! that was removed.
[End of TCO 7.1223]
TCO-number: 7.1225
Written-by: GSCOTT Creation-date: 16-Feb-88 13:45:04
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: DISC
Problem:
Files can go from 34359738367(36) to 1073741823(36).
Diagnosis:
TCO 7.1059 incomplete - didn't change code at UPDLEN.
Solution:
Add code in DISC to do the same OFNLEN/-1 hack for UPDLEN that was
implemented in TCO 7.1059.
[End of TCO 7.1225]
TCO-number: 7.1226
Written-by: RASPUZZI Creation-date: 16-Feb-88 14:22:55
Edited-by: RASPUZZI Edit-date: 16-Feb-88 14:23:52
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: MEXEC BOOT
Related-SPR: 21571
Problem:
TOPS-20 does not ask "Why reload?" or "Run CHECKD?" when a new monitor
is booted when RSX20F version 15-50 is on the front end.
Diagnosis:
RSX20F version 15-50 will now give the KL the date and time (if it has
it set) when the KL asks for it. RSX20F never used to do this. When
TOPS-20 gets the date and time from the front end, it assumes that it
is being reloaded because of a BUGHLT. This may not be the case if a
new monitor is being loaded.
Solution:
The solution of this problem is two fold. First, have boot tell TOPS-20
if it is being reloaded because of a BUGHLT. Second, have TOPS-20 decide
what to do about the startup questions based on the information provided
by BOOT. BOOT will put a positive number in location BOOTFL if the system
is being manually restarted and it will put a negative value in BOOTFL if
the system is being restarted because of a BUGHLT. For backwards compatibility,
the monitor will do what it used to do if BOOTFL contains a 0. In this case,
an older boot will be running.
[End of TCO 7.1226]
TCO-number: 7.1227
Written-by: WADDINGTON Creation-date: 16-Feb-88 14:35:09
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: LATSRV
Problem: LATNSC information not always helpful. In particular, the Ethernet
Address is sometimes trashed.
Diagnosis: Edit 7369 added the Ethernet Address to the additional data in the
LATNSC BUGINF. Unfortunately, we used the Ethernet Address from the Circuit
Block. In some cases, a LATNSC can occur when there is no Circuit Block.
Consequently, the Ethernet address is garbage in these cases.
Solution: Get the Ethernet Address from the Receive Buffer. This should always be correct. In addition add a little more info to the description of t BUGINF.
[End of TCO 7.1227]
TCO-number: 7.1230
Written-by: GSCOTT Creation-date: 18-Feb-88 15:13:28
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: MEXEC JSYSM TTYSRV GLOBS
Problem:
CTY doesn't stay as an LA120 after system starts up - first <CRLF> on CTY
turns it back into SYSTEM-DEFAULT.
Diagnosis:
Code in MEXEC and JSYSM and TTYSRV doesn't special case the CTY properly.
Solution:
Fix code to (1) make the CTY an LA120 in INICTY (TTYSRV), create new routine
BLINKS in JSYSM that does the usual TLINK JSYS when we are done with a
terminal to break links and advice, modify 4 places in MEXEC and JSYSM to
call BLINKS, and have the BLINKS routine set the CTY to be an LA120 after
doing the TLINK. In this way, whenever a job is not logged in on the CTY
it will be an LA120.
[End of TCO 7.1230]
TCO-number: 7.1231
Written-by: RASPUZZI Creation-date: 18-Feb-88 15:19:32
Edit-checked: No Document: Yes TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: EXECIN EXECSE GLOBS STG JSYSA JSYSF
MONSYM SETSPD
Problem:
The monitor does not restrict the length of a password that a user
can set on a directory.
Diagnosis:
No intelligent code to do it.
Solution:
Add code to CRDIR% to make sure that passwords pass the minimum length
criterion. This is done before the password is encrypted so if CRDIR%
is given an already been encrypted password, it will not count its
size.
Add a new SMON% function to set the minimum password length and add a
corresponding TMON% function to read this length.
Add a new _^ESET MINIMUM-PASSWORD-LENGTH command to the EXEC so that
this can set easily. Also, make a similar ENABLE/DISABLE command in
SETSPD so minimum lengths can be set at system startup.
[End of TCO 7.1231]
TCO-number: 7.1237
Written-by: GSCOTT Creation-date: 22-Feb-88 10:56:49
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: DOB
Problem:
Bugs set not normally dumpable aren't.
Diagnosis:
DB%NND check lost from DOB.MAC.
Solution:
Reinsert check for DB%NND in DOB.MAC, start checking REDITs a little
more carefully.
[End of TCO 7.1237]
TCO-number: 7.1238
Written-by: GSCOTT Creation-date: 22-Feb-88 18:09:03
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: MANy
Problem:
BUGs under DEBUG monitor dump when they shouldn't.
Diagnosis:
Many BUGs not looked at because the BUGSTRINGS used was from a DEBUG=0
monitor.
Solution:
Set a few more BUGs not normally dumpable.
[End of TCO 7.1238]
TCO-number: 7.1240
Written-by: MCCOLLUM Creation-date: 23-Feb-88 15:42:44
Edited-by: MCCOLLUM Edit-date: 23-Feb-88 15:44:18
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: LATSRV
Problem:
COMMMS BUGHLTs.
Diagnosis:
When LATSRV hands a buffer to PHYKNI to be transmitted, bit DLL.FL is turned
on in the transmit buffer header. When the trasnmit completes, DLL.FL is
turned off and the buffer is placed on the unacknowledged queue. If no
acknowledgement is received from LATSRV before the circuit timer expires,
routine XUNAKQ is called to retransmit the buffers. This routine neglects
to turn on DLL.FL in the buffer header. If for any reason the circuit is
stopped while the buffer is in PHYKNI, LATSRV will release the free space
associated with all buffers on the unacknowledegd queue that have this bit
turned off. Since XUNAKQ never lights this bit, this is true for all buffers,
even the ones currently in PHYKNI. When the retransmit subsequently completes,
LATSRV attempts to release the free space again and a COMMMS BUGHLT results.
Solution:
Turn on bit DLL.FL in XUNAKQ when the buffer is handed over to PHYKNI via
routine DLLUNI. If DLLUNI fails to queue the buffer, turn DLL.FL off.
If the circuit goes away while the buffer is in PHYKNI, the free space
will be released when the transmit completes.
[End of TCO 7.1240]
TCO-number: 7.1241
Written-by: RASPUZZI Creation-date: 25-Feb-88 08:29:15
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: FILMSC
Related-QAR: 105
Problem:
ILMNRF BUGHLTs.
Diagnosis:
In FILMSC, we call routine TTYSCN and if this routine fails then
we JRST to TTYCL1. TTYCL1 expects to have T1 set up with an index
into the device tables and we have not done so.
Solution:
At TTYCL1, load T1 with the index into the device tables with the
item saved in STKVAR variable location TTYCLX.
[End of TCO 7.1241]
TCO-number: 7.1243
Written-by: GSCOTT Creation-date: 26-Feb-88 10:17:55
Edit-checked: No Document: Yes TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: MNETDV
Problem:
We are all real sick and tired of waiting for HSTINI to carefully polish
and store all of the hosts in a 572K character HOSTS.TXT file when we are
debugging the monitor.
Diagnosis:
Cretinous code doesn't know that DBUGSW is set up for debugging.
Solution:
Make MNETDV know about DBUGSW and if it is greater than 1 try to load
SYSTEM:HOSTS.DEBUG rather than SYSTEM:HOSTS.TXT.
[End of TCO 7.1243]
TCO-number: 7.1244
Written-by: GSCOTT Creation-date: 28-Feb-88 17:33:08
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: DOB
Problem:
DOB has problems after TCO 7.1215 is installed.
Diagnosis:
This TCO busted GETPGS. This discovered a problem with IORBER. It causes
page faults since it is damaged. Also, IORBs showing an error after DOB has
continued the system cause KPALVH. Also, an error in the middle of the dump
detected by PHYSIO doesn't abort the dump since the error bits are not checked
in SAVMEM's loop.
Solution:
Fix GETPGS, IORBER. Use a 64 page chunk size to avoid overruns too.
[End of TCO 7.1244]
TCO-number: 7.1245
Written-by: GSCOTT Creation-date: 1-Mar-88 14:45:42
Edited-by: GSCOTT Edit-date: 1-Mar-88 15:02:27
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: T20-AN LN2070
Problem:
Monitor builds too slow for development. Or maybe I'm just not patient enough
in my old age.
Diagnosis:
T20-AN70 always builds AN-MONBIG, AN-MONMAX, and AN-MONDCN, however only
AN-MONDCN is ever used for development. LN2070 always builds 2060-MONBIG and
2060-MONMAX, however only 2060-MONMAX is ever used for development. You can't
just use labels ARPDCN or MONMAX since they don't compile the sources and
append the REL files. Developers commonly copy the CTL files and remove the
building of the monitors that are not used. It seems reasonable to standardize
this.
Solution:
Add new tag MONDEV which compiles the sources, appends REL files, then builds
just AN-MONDCN or 2060-MONMAX.
[End of TCO 7.1245]
TCO-number: 7.1246
Written-by: GSCOTT Creation-date: 1-Mar-88 15:04:21
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: DSKALC
Problem:
The ASAASG, DEAUNA, and ASGBPG BUGs are set normally dumpable.
Diagnosis:
They shouldn't be.
Solution:
Make them not normally dumpable.
[End of TCO 7.1246]
TCO-number: 7.1247
Written-by: RASPUZZI Creation-date: 1-Mar-88 15:22:29
Edited-by: RASPUZZI Edit-date: 4-Mar-88 10:18:20
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: CLUDGR
Problem:
XBLTAL BUGHLTs when using INFO% JSYS under mysterious circumstances.
Diagnosis:
This is a dandy. Basically, 2 forks on the remote system got the same
unique CLUDGR ID (I guess they weren't unique then, were they?). This
confused the remote system because it received 2 packets both indicating
that they were 1 of 1. Nasty things then happen when the CLUDGR fork
attempts to transfer the information in the working data page of the
CLUDGR fork.
Solution:
Well, I wrote this code and I haven't got a clue as to how this could
happen. The CLUID is obtained while a process is NOSKED and the obtainment
of CLUID uses a AOS Q3,CLUID instruction which is not interruptable either.
So, what do we do? DEBUG code has been added to CLUDGR to be defensive
about a process using an old unique code. The only gotcha about this code
is that a system may crash with a CLUFUD BUGHLT when the CLUID word wraps
around 18 bits.
[End of TCO 7.1247]
TCO-number: 7.1251
Written-by: RASPUZZI Creation-date: 3-Mar-88 13:25:48
Edited-by: RASPUZZI Edit-date: 4-Mar-88 10:19:43
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: D36PAR
Problem:
NTMSQF BUGCHKs during system startup.
Diagnosis:
It appears that the signal queue is not large enough for a machine
that is used as a designated router.
Solution:
Increase the potential size of the signal queue.
[End of TCO 7.1251]
TCO-number: 7.1252
Written-by: RASPUZZI Creation-date: 3-Mar-88 14:53:44
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: FREE
Problem:
RELRNG BUGCHKs.
Diagnosis:
RELRNG are serious enough such that they should be BUGHLTs but
only for the duration of field test. They must be changed back
to BUGCHKs when 7.0 is shipped to the SDC.
Solution:
Change RELRNG to BUGHLT for now.
[End of TCO 7.1252]
TCO-number: 7.1253
Written-by: GSCOTT Creation-date: 6-Mar-88 19:36:03
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: MEXEC GLOBS
Problem:
DDMPNRs during the time that SETSPD is copying a DOB dump.
Diagnosis:
SETSPD takes too long to copy large files, and since CHKR is blocked
waiting for SETSPD to run, DDMPNRs are the result.
Solution:
Change MEXEC's routine DOBSSP to just start the DOB copy then return,
storing the fork handle in DOBFRK. Have CHKR see if there is a DOBFRK
and call a little routine (DOBKSP) to kill SETSPD if it has finished.
Don't start a new SETSPD in DOBSSP unless DOBFRK is zero.
[End of TCO 7.1253]
TCO-number: 7.1254
Written-by: GSCOTT Creation-date: 6-Mar-88 23:00:45
Edited-by: GSCOTT Edit-date: 8-Mar-88 17:46:50
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: DOB STG
Problem:
(1) Monitor and/or RSX-20F dies in the middle of a DOB error. RSX20F hangs
seen at the beginning or end of a DOB session.
(2) Still problems with DOI.
(3) Section 0/1 space wasted.
(4) Dump may have incorrect information.
Diagnosis:
(1) PIs on at bad times (like when calling MRTOFF, WATEPT and UNWEPT).
Printing out error messages when called back by PHYSIO is a bad idea.
(2) Apparently PHYSIO gets gastric distress when fed a rich meal of a number of
IORBs each transferring many pages.
(3) We got a page then divided it up into NUMIOR size pieces; this is wasteful
since the transfers can only be XFRPAG pages long.
(4) IORB still busy writing data when we turn on timesharing, this causes
memory to be modified before it can be dumped.
Solution:
(1) Routine SAVPI should return PIOFF, then we can call mysterious routines,
then go PION. Shut off the PI system before calling UNWEPT and PFHRST, letting
RESTPI restore and reenable the PI system. Don't try to print out the errors
using DOB routines when being called back by PHYSIO. IORBDN will set IORBER to
the address of the IORB that had an error and light DB%ERR in DOBSTS. IORBER
is now called by SAVMEM when DB%ERR is set.
(2) Use one IORB and a generous number of contiguous pages for dumping to avoid
DOI and let PHYSIO recover from any overrun.
(3) Get section 0/1 space for the CCWs using an RS macro based on the computed
largest possible IORB (XFRPAG pages) or a CCW size of (XFRPAG/XFRSIZ)+3. The
resident general pool can now be cut back to 1400 words, removing TCO 7.1177.
(4) Wait for all I/O to complete before shutting off and resetting the PI
system (and returning back to APRSRV).
[End of TCO 7.1254]
TCO-number: 7.1257
Written-by: RASPUZZI Creation-date: 17-Mar-88 19:08:45
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: JSYSF
Problem:
Starting a monitor with the 143 dialog causes CFCLDPs.
Diagnosis:
Edit 7440 made to DIRINI assumed that CRDIR% was the only caller
of DIRINI. As the 143 dialog demonstrates, this is not the case.
DIRINI calls CRDSWH which attempts to store something in a JSBVAR
location which does not exist because we did not come through here
via CRDIR%.
Solution:
Make DIRINI have 2 entry points. One for CRDIR% and one for all
other callers. Only call CRDSWH when DIRINI is entered via CRDIR%.
[End of TCO 7.1257]
TCO-number: 7.1258
Written-by: RASPUZZI Creation-date: 17-Mar-88 19:20:22
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: ENQ FREE
Related-QAR: 88
Problem:
FSPOUT while running a program that does like many ENQs and
like many DEQs like a lot.
Diagnosis:
Routine FSPREM like only returns like the total amount of the
free space like left in the ENQ pool. ENQ like checks this
and fer sure the count is high enough but there may not be
a block in the pool that is big enough to like satisfy ENQ's
hungry request.
Solution:
Like add some code to that narly routine FSPREM and have it
also return the largest block that is remaining in the pool.
Then like teach ENQ that like this is the number that it really
cares about. If the largest block is like too small, then have
ENQ clean up those tubular cached lock blocks.
[End of TCO 7.1258]
TCO-number: 7.1259
Written-by: RASPUZZI Creation-date: 17-Mar-88 19:26:36
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: CLUDGR
Problem:
LCLWAT scheduler test just doesn't work. User's hang in the
INFO% JSYS because of it.
Diagnosis:
The scheduler test data is not put in the right place before
MDISMSing to the scheduler test.
Solution:
Put scheduler test data in the left half of T1 and not the
right hand. This may be the last 7.0 TCO. On to autopatch!
[End of TCO 7.1259]
TCO-number: 7.1260
Written-by: GSCOTT Creation-date: 18-Mar-88 10:26:40
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: JSYSM
Problem:
Jobs don't logout when logged out from another job.
Diagnosis:
When ELOGO (JSYSM) is called to do the logout it just sets the logout bit in the
top fork of the target job. This bit is in FKINT. However if the job is stuck
in TCOTST (because output buffers are full since the bozo hit BREAK from a LAT
session or terminal is ^Sed) FKINT is not looked at by the scheduler since you
never get out of TCOTST since you never output any characters to the terminal.
Solution:
Clear output buffers immediately after setting the logout bit in FKINT when
logging out the target job. This causes the bit in FKINT to be noticed by
the scheduler since you get out of TCOTST. Also there is a CFOBF in the logout
code that leaves AC1 setup with 1000 from the DISMS in the code that waits
for the "Killed by ..." message to complete.
[End of TCO 7.1260]
TCO-number: 7.1261
Written-by: GSCOTT Creation-date: 18-Mar-88 10:48:24
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: MNETDV
Problem:
NOADDR takes a DOB.
Diagnosis:
NOADDR just means that the file SYSTEM:INTERNET.ADDRESS file is missing or owie.
We should not dump in this case.
Solution:
Set NOADDR normally not dumpable.
[End of TCO 7.1261]
TCO-number: 7.1262
Written-by: RASPUZZI Creation-date: 23-Mar-88 11:49:01
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: CLUDGR
Problem:
XBLTAL BUGHLTs.
Diagnosis:
CL.ENT is doing a stupid thing. Instead of following the design
spec and sending the local port's number over in the SCA buffers,
it is sending the destination's port in the SCA buffers. This can
wreak havoc on the remote system if 2 requests come from 2 different
systems with the same CLUID.
Solution:
Slap up CL.ENT into using MYPOR4 when slam dunking the CI node number
into SCA buffers when calling routine FILLIN.
[End of TCO 7.1262]
TCO-number: 7.1264
Written-by: RASPUZZI Creation-date: 29-Mar-88 14:42:08
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: IMPDV
Problem:
SKDPF1 and PITRAP BUGHLTs when booting an ARPAnet monitor.
Diagnosis:
Code in IMPDV is doing things in places that it shouldn't.
These places can get swapped out and IMPDV runs at interrupt
level.
Solution:
Repeat 0 out not needed code. Why is it not needed? Well, it
appears that the Internet fork does the work for us.
[End of TCO 7.1264]
TCO-number: 7.1266
Written-by: RASPUZZI Creation-date: 5-Apr-88 15:10:30
Edited-by: RASPUZZI Edit-date: 5-Apr-88 15:11:16
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: MEXEC
Related-QAR: 158
Problem:
When someone hits the ENABLE/DISK buttons, the system does not
ask "Why Reload?" or "Run CHECKD?"
Diagnosis:
Small oversight in TCO 7.1226.
Solution:
Check to see if the front end knew the time. If it didn't, then someone
hit enable disk or loaded the system fresh. In this case, flag that the
questions must be asked.
[End of TCO 7.1266]
TCO-number: 7.1267
Written-by: LOMARTIRE Creation-date: 7-Apr-88 10:57:43
Edit-checked: Yes Document: Yes TCO-tested: Yes
Maintenance-release: No Hardware-related: No
Program: Monitor
Routines-affected: DSKALC
Related-QAR: 149
Problem:
There is no way for a user to encrypt the system structure during the 143
dialog.
Diagnosis:
The saga continues....we have finally decided to put this in (again).
Solution:
Add a question to the 143 dialog to allow the user to set the system
structure either encrypted or unencrytped. This question, which will follow "Do
you want the default size bootstrap area?", will look like:
Do you want to enable password encryption for the system structure?
The help text printed if the user hits ? is:
[Type 'YES' to enable password encryption or
'NO' to disable password encryption]
Only YES (or Y) or NO (or N) can be entered at this point.
The Installation manual will have to change to reflect this new question.
[End of TCO 7.1267]
TCO-number: 7.1268
Written-by: RASPUZZI Creation-date: 7-Apr-88 15:11:30
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: Yes
Program: MONITOR
Routines-affected: PHYMSC
Problem:
When PHYMSC builds a UDB, it does not set the US.UNA (disk
is unavailable) bit. This is bad as the system now assumes
access to the disk when the UDB is built.
Diagnosis:
This may be causing bad side effects in the login structures
code. Some development is planned for the login structures
code to use the fact that US.UNA will BE set at UDB creation.
This bit will be cleared when the HSC disk has been onlined
(unless some bozo puts a 16 bit HDA disk out there in which
case the bit will not get cleared).
Solution:
Add US.UNA to the foray of bits that are set in UDBSTS during
UDB creation.
[End of TCO 7.1268]
TCO-number: 7.1270
Written-by: WADDINGTON Creation-date: 8-Apr-88 15:55:26
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: LATSRV
Problem: LATOP% .LARHC interrupts don't.
Diagnosis: We're grabbing the PSI channel from the wrong location in the
argument block.
Solution: Get PSI channel from correct location. Add a range check just
to be on the safe side
[End of TCO 7.1270]
TCO-number: 7.1272
Written-by: GSCOTT Creation-date: 12-Apr-88 14:36:25
Edited-by: GSCOTT Edit-date: 12-Apr-88 14:48:07
Edit-checked: Yes Document: No TCO-tested: Yes
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: JSYSM
Problem:
When edit 7456 was installed to correct a number of accounting problems, the
path taken through ATACH with AT%TRM (aka "proxy attach") was not tested or
considered since no DEC software makes use of AT%TRM. Edit 7456 fixed a number
of accounting problems when session records were not written when session
critical data was changed (particularly when the account string or terminal
number was changed). Customers do use proxy attach and an SPR was the result.
The ATACH JSYS code writes a session record for the target job (specified in AC
1) immediately before the job is detached in preparation for attachment to the
target terminal (specified in AC 4). This is done to insure that a session
record is written showing the time used in the session during which the ATACH
JSYS was started. However, each time a proxy attach (AT%TRM) is done, a bad
session record is written which includes a blank username string, a zero
session start time, zero runtime, zero session elapsed time and other bad
fields.
Diagnosis:
At ATACHB, it has been determined that the target job (as specified in AC1) is
attached to a terminal and needs to be detached before it is attached to the
target terminal number (the controlling terminal or the terminal specified in
AC4 if AT%TRM is set). Before the job's terminal number is changed, a session
record reflecting the time used on the target job's current terminal must be
written.
First, the target job's JSB is mapped by calling routine MAPJSB, then routine
DETREC is called to write the session record. If a proxy attach and the
caller's job number is specified as the target job, MAPJSB is called with our
own job number. In this case, SETJSB (called by MAPJSB) maps nothing and
returns zero as the JSB offset rather than mapping the target job's JSB into
FPG1. DETREC depends on the target job's JSB mapping to FPG1, as noted by
milestones in MAPJSB. DETREC assumes that the JSB is mapped to FPG1 since the
offsets for the USAGE JSYS refer to FPG1A. Since a zero page is used to write
the session record instead of a JSB, the USAGE JSYS in DETREC passes lots of
zeroes to the accounting file (GIGO), resulting in a very strange session
record.
Solution:
If DETREC is called with a JSB offset of zero then we are writing a session
record for our own job, and routine DETSES should do the work instead of DETREC
(insert "JUMPE T1,DETSES" at DETREC).
[End of TCO 7.1272]
TCO-number: 7.1273
Written-by: RASPUZZI Creation-date: 12-Apr-88 14:43:59
Edited-by: RASPUZZI Edit-date: 12-Apr-88 14:45:52
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: CLUDGR
Problem:
When a system in the cluster crashes, all nodes get INFO%
errors even if they are waiting for information from a
different system than the crashing one.
Diagnosis:
CLWAKE is obviously braindead. It went ahead and woke up all
forks waiting for cluster information instead of the ones waiting
on the crashing system only.
Solution:
Make CLWAKE check to see if the node of the request is the same
as the node that is crashing. If so, then wake up the corresponding
fork. If not, let the request remain for an answer.
[End of TCO 7.1273]
TCO-number: 7.1274
Written-by: RASPUZZI Creation-date: 13-Apr-88 20:49:14
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: IMPDV
Problem:
TCO 7.1264 was installed without thinking (that happens).
Diagnosis:
SKDPF1s and PITRAPs only occur when HSTSTS is bigger than 400000
and is not the case in the clock tape monitor. Therefore, 7.1264
should not be in the clock tape monitor. The code that was removed,
does, in fact, appear to be OK.
Solution:
Restore the REPEAT 0ed code that 7.1264 took out.
[End of TCO 7.1274]
TCO-number: 7.1275
Written-by: GSCOTT Creation-date: 14-Apr-88 14:33:11
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: PHYM78
Problem:
TM8FKRs when they aren't deserved.
Diagnosis:
At TM8ZAP we do a TM CLEAR to the TM78 and then we enter a loop waiting
for TM READY to come up. This loop consists of a massbus register read
and a test for TM READY. When this loop doesn't get TM READY in 10000
(octal) tries, a TM8FKR is printed and TM8ZAP returns. Callers of TM8ZAP
don't really care if TM READY comes up (at startup time we will fail to
see some drives; after a TU FAULT, TM READY is checked before restarting
the I/O).
Solution:
Investigation shows that this loop counter may get as high as 30000 (octal)
before TM READY comes up. Further investigation shows that TOPS-blue uses
40000 as a loop counter when it wants to wait for TM READY after TM CLEAR
comes up. So, the best thing to do is to change the loop counter to 40000,
and if anyone geta a TM8FKR it will hopefully be from a broken TM78.
[End of TCO 7.1275]
TCO-number: 7.1276
Written-by: RASPUZZI Creation-date: 20-Apr-88 10:22:45
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: Yes
Program: MONITOR
Routines-affected: PHYMSC PHYPAR DSKALC
Related-QAR: 170
Problem:
Sometimes when a system boots in a clustered environment, it
does not find the login structure.
Diagnosis:
It appears that MSCP activity has not settled enough in the dust to
let FNDLGS do its thing.
Solution:
Wait for 10 seconds at the top of FNDLGS first. Then introduce a new
bit (U1.NOL) that appears in the second status word of a unit's UDB.
This bit will be set upon creation of the UDB for a disk unit and
cleared when a unit is onlined. Have CHKUDB wait for this bit to be
cleared but only for MSCP disks.
[End of TCO 7.1276]
TCO-number: 7.1278
Written-by: RASPUZZI Creation-date: 20-Apr-88 11:14:24
Edit-checked: No Document: Yes TCO-tested: No
Maintenance-release: No Hardware-related: Yes
Program: MONITOR
Routines-affected: PHYSIO FREE
Problem:
Lots of ONSTR/OFFSTRs and RELRNG BUGHLTs.
Diagnosis:
ONSTRs and OFFSTRs are informational and RELRNGs are not serious
enough to be BUGHLTs (except in field test).
Solution:
For the official release, ONSTR/OFFSTRs will be under CIBUGX and
RELRNGs will be BUGCHKs. DOB% can be used to take a RELRNG dump.
[End of TCO 7.1278]
TCO-number: 7.1279
Written-by: GSCOTT Creation-date: 21-Apr-88 16:05:05
Edited-by: GSCOTT Edit-date: 21-Apr-88 16:10:49
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: JSYSM
Problem:
MOUNTR gives "accounting record not written: No such job" when a job logs out
with a tape or disk mounted. The system must be under load to see this
problem.
Diagnosis:
When a user logs out, the monitor sends an IPCF message to MOUNTR. When MOUNTR
gets this IPCF message, it issues USAGE records for the regulated structures or
magtapes that were mounted by the job. If the job finishes logging out before
MOUNTR does the USAGE JSYS, the USAGE fails with a "No such job" error. The
call to GL2LCL in UFNINI returns this error.
Solution:
The reason that GL2LCL is called is to validate the job and get the local job
number to put into the block that is queued to job 0 to update the checkpoint
file. The checkpoint file will not be updated when the USAGE record queued is
not a session type record (e.g. the USAGE function is .USENT). There is no
need to call GL2LCL when the checkpoint file is not being updated. The cure is
to check the entry type (as specified in AC1 of the USAGE JSYS call) and do not
call GL2LCL if the function is .USENT.
During the investigation into this problem it was discovered that there are a
numbr of places in the accounting code where "PS:[ACCOUNTS]" is referred to.
These will be changed to "ACCOUNT:" (as changed by the Login Structures
Project).
[End of TCO 7.1279]
TCO-number: 7.1280
Written-by: GSCOTT Creation-date: 25-Apr-88 13:20:10
Edited-by: GSCOTT Edit-date: 25-Apr-88 13:36:18
Edit-checked: Yes Document: No TCO-tested: Yes
Maintenance-release: Yes Hardware-related: No
Program: MONITOR
Routines-affected: DTESRV
Problem:
Edit 7449 (to 6.1) attempted to prevent SKDCL1s at power fail restart time by
replacing the routine DTBELL in DTESRV. After edit 7449 is installed, DN60s
only load every other try.
Diagnosis:
Edit 7449 replaced DTBELL which had a scheduler test in it for the usual DBTMR
scheduler test. Since at power fail restart time we are at scheduler level it
is impolite to try a scheduler test. At system boot time and other times that
DTBELL is called we are not at scheduler level. The replacement code doesn't
make use of a scheduler test (hence no pesky SKDCL1s), but it doesn't allow the
DN60 to be reliably loaded. We won't mention names here, but I guess you can
imply that some engineer didn't test edit 7449 thoroughly ("No more SKDCL1s?
Ship it!").
Since the only time that DTBELL is called at scheduler level is when we are in
a power fail restart, and since most power fail restarts fail on non-core
memory systems (and for other reasons not related to this problem), it seems
reasonable to put the old code back for release 7. A better fix will have to
go out on a future Autopatch tape.
Solution:
Remove edit 7449, and reinsert the old code.
[End of TCO 7.1280]
TCO-number: 7.1281
Written-by: WADDINGTON Creation-date: 3-May-88 14:51:29
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: LATSRV
Problem: Bad multicast messages being generated by TOPS-10/20
Diagnosis: We don't interlock the Multicast Buffer, so it can get changed
while it is still in the DLL, thereby corrupting the buffer
Solution: Interlock the Multicast buffer by setting bit DLL.FL in the UID
Field. Clean up LAINTX's handling of multicast buffers. Remove a (now)
redundant test from XMTDON. Test for the DLL.FL bit at the beginning of
LATXMC.
[End of TCO 7.1281]
TCO-number: 7.1282
Written-by: RASPUZZI Creation-date: 3-May-88 16:37:48
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: PAGEM
Related-SPR: 21881
Problem:
NSKDIS BUGHLTs.
Diagnosis:
Routine SECMAP goes NOSKED and then calls LCKOFN. LCKOFN is assumed
to be called OKSKED because it may wind up waiting for an OFN which
is locked to be freed. It does this by calling WTOFNS. SECMAP is
violating this rule by calling LCKOFN NOSKED.
Solution:
Instead of calling LCKOFN, have SECMAP simulate the code in line as
per RELP4 has done. This will ensure that WTOFNS will be called in
the correct state.
[End of TCO 7.1282]
TCO-number: 7.1283
Written-by: RASPUZZI Creation-date: 5-May-88 15:09:34
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: IPCF IPIPIP
Problem:
None observed but there are code changes to IPCF.MAC and IPIPIP.MAC
that users of the GTDOM% JSYS would like to have. Mainly, it is code
that makes page mode transfers work in monitor context.
Diagnosis:
As above.
Solution:
Add code to IPCF.MAC and make a routine global in IPIPIP.MAC .
[End of TCO 7.1283]
TCO-number: 7.1285
Written-by: LOMARTIRE Creation-date: 6-May-88 06:52:57
Edit-checked: Yes Document: No TCO-tested: Yes
Maintenance-release: No Hardware-related: No
Program: Monitor
Routines-affected: PAGUTL
Related-QAR: 9
Problem:
OFJFBD BUGHLTs.
Diagnosis:
Routine CHKLAC is used to insure that a long file is being opened
consistently with regards to it's former short file access. If page table
zero is open on the system, then CHKACC is called to do the validity
check. If PT0 had previously only been opened unrestricted, and the new
opening is "real", then the PT0 access flags in SPTH must be updated to
reflect the PTT access. However, the wrong instruction was used to do this
and the OFN2XB bit for PT0 was getting cleared. This was being detected
later and the OFJFBD resulted.
Solution:
Change the XORM to an IORM to set the bits in SPTH of PT0.
[End of TCO 7.1285]
TCO-number: 7.1286
Written-by: LOMARTIRE Creation-date: 9-May-88 08:40:51
Edited-by: LOMARTIRE Edit-date: 9-May-88 10:28:47
Edit-checked: Yes Document: No TCO-tested: Yes
Maintenance-release: No Hardware-related: No
Program: Monitor
Routines-affected: ENQ
Related-TCO: 7.1072
Related-QAR: 200
Problem:
Jobs on different systems can end up waiting forever in ENQTST for the
same lock. Also, a job on one system is not always correctly notified of the
release of a lock from another system.
Diagnosis:
Routine QSKDRC is an alternate entry point to QSKD. It is called by Vote
Responder routine EVQSKD when an incoming Q-Block scheduling query is received.
The exising logic in QSKD does not take into account that a Lock-Block can now
have its first (and possibly only Q-Block) be unlocked. The existing code
incorrectly assumes that the first block is locked and this causes a "No" reply
to the incoming vote request.
TCO 7.1179 attempted to cut down on needless broadcasts for
non-cluster-wide locks. Unfortuneately, when the last Q-Block is DEQed, the
EN.CLL bit is cleared in the Lock-Block (by routine QDLBFS) and then LOKSKD is
called to cause other nodes to be notified. But, since EN.CLL is cleared, no
notification will be sent.
Solution:
First, make QSKD smarter. If QSKDRC is called, check EN.LOK in the Q-Block
during the scanning process. If it is not sent, then ignore that Q-Block in the
verification process.
Second, rearrange the code at QDEQ0 so that LOKSKD is called before
QDLBFS. In this way, the state of the Lock-Block will accurately reflect the
last locker (before being reset by QDLBFS).
[End of TCO 7.1286]
TCO-number: 7.1287
Written-by: RASPUZZI Creation-date: 13-May-88 16:32:43
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: STG
Related-QAR: 211
Problem:
SKDPF1 BUGHLTs when a reverse LAT connection is completed.
Diagnosis:
LATSRV attempts to write into the DEVCHR table to assign the TTY
device to the job doing the host intiated connect. However, DEVCHR
is in the swappable monitor and may be swapped out at the time
the scheduler decides to do this.
Solution:
Make DEVCHR resident.
[End of TCO 7.1287]
TCO-number: 7.1288
Written-by: RASPUZZI Creation-date: 17-May-88 14:22:44
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: Yes
Program: MONITOR
Routines-affected: PHYKLP
Problem:
KLPNOM BUGHLTs and no useful code in PHYKLP being used.
Diagnosis:
Routine SANCHK is called when someone gives back a buffer to the
port. Unfortunately, this routine has been RETed because it caused
CI problems at one time. Since this code exists under the KLPDBG
conditional, it is beneficial to have it useable. No one runs a
DEBUG monitor unless they are having serious troubles and need
traces (like we do for our KLPNOMs).
Solution:
Remove the RET in SANCHK and have it perform the sanity checks it
was predestined to do.
[End of TCO 7.1288]
TCO-number: 7.1289
Written-by: WADDINGTON Creation-date: 19-May-88 14:58:56
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: LATSRV
Problem: COMMMS Bughalts (again...)
Diagnosis: TCO 7.1281 moved the testing/clearing of DLL.FL and SAV.FL from
scheduler level to interrupt level, thereby opening a large window where
transmit buffers could be released twice, which of course results in a
COMMMS.
Solution: Move the DLL.FL/SAV.FL code back to XMTDON where it belongs.
[End of TCO 7.1289]
TCO-number: 7.1290
Written-by: GSCOTT Creation-date: 19-May-88 16:08:05
Edited-by: GSCOTT Edit-date: 19-May-88 16:14:36
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: JSYSF
Problem:
When a structure is created and minimum password length is more than 6, then
the new structure's <OPERATOR> directory is full of zeroes (which is not a
legal format for a directory). Obvious result is DIRPG0 and inability to load
files into <OPERATOR>.
Diagnosis:
It would appear that the well intentioned minimum password length project has
reared its ugly head again. The check for minimum password length should not
be enforced if the CRDIR is done from monitor context (FILINI building initial
file structure directories in FILCRD).
Solution:
Check for previous context of monitor in CRDI2A plus some and don't do the
minimum password length check if CRDIR called from monitor. God, I'm really
glad we caught this one 2 days before the clock tape freeze!
[End of TCO 7.1290]
TCO-number: 7.1292
Written-by: LOMARTIRE Creation-date: 24-May-88 16:18:36
Edited-by: LOMARTIRE Edit-date: 25-May-88 16:00:09
Edit-checked: Yes Document: No TCO-tested: Yes
Maintenance-release: No Hardware-related: No
Program: Monitor
Routines-affected: ENQSRV
Related-TCO: 7.1072
Related-QAR: 210
Problem:
Fork hung in EVWAIT waiting for a cluster-wide ENQ vote reply to be
returned.
Diagnosis:
It is possible to have the ENQ Answer Fork running on the same system
which is trying to issue a vote request. This is a very basic violation of the
rule that says: "If thy ENQ Answer Fork is running, then thy ENQ Database Lock
Token must be heldth on another node (thy one which issueith thy vote)." This
violation will allow multiple people to be fooling around with the VRQA. This
will make the voting results indeterminate and mess up the count of outstanding
replies (VOTVCT).
Solution:
Do not exit EVWAIT if a "No" reply is received. This will insure that the
Answer fork runs on all systems while the voting system has the Database Lock
Token. Also, do not exit ASK4IT is a "No" reply is received during the voting
loop. Instead, jump to ASKCHK to wait for replies from any votes sent thus far.
In ASKCHK, don't return success if a "No" reply was received.
[End of TCO 7.1292]